The Journal of 
Experimental Education 


A periodical report of scientific investigations relating to child development, 
curriculum, ee ng ota gna Aan measurements, 





XIV DECEMBER, 1945 Number 2 





THE MEASUREMENT OF TEACHING ABILITY 


" 


CONTENTS 


A Study of the a Between Teaching Procedures and Edncational 


Improvability of Teachers in Service: C. R. Von Eschen 
Personality and Teaching Efficiency: R. E. Gotham 
A Factor Analysis of Teacher Abilities: A. G. Hellfritesch 


a 


Impressions, Trends, and Further Research: A. S. Barr 





$500 aA YEAR _ PuBLisHEep QUARTERLY $1.50 a Cory 


—_—_—__—— 





Published by Dembar 114 &. Carroll Street, 
Madison 3, W 
Entered as second-class matter October 17, 1988 at the post office at Madison, Wisconsin, 
9g under the act’ of March 3, 1879. 





EDITORIAL BOARD 
A. S. Barr, Chairman, Professor of Education, University of Wisconsin, Madison 6, Wis. 


Arthur T. Jersild, Professor of Education, Teachers Col- 
lege, Columbia University, New York City. Editorially 
responsible for materials on child welfare, guidance, 
and development, published each December. 

Palmer O. Johnson, Professor of Education, University of 
Minnesota, Minneapolis, Minnesota. Editorially re- 
sponsible for materials on measurements, statistics, and 
methods of experimental research, published each March. 


CONTRIBUTING EDITORS 


eS on eee - Patent See S 

Clinic, Pennsylvania State College, State Col- 
lege, Pennsylvania. 
Gilbert L. Betts, of Graduate Research in Ed- 
ucation, Colorado College, Fort Collins, Colorado. 


Leo J. Brueckner, Professor of Education, University of 
Minnesota, Minneapolis, Minnesota. ; 


Oscar K. Buros, Associate Professor of Education, Rut- 
gers penn New Brunswick, New Jersey. 


G T. Professor of Educational Psycho! 
"“Dniversity fal Chee Chines , Dilinois. _ 


. Boyce Thom 
. Yonkers, New York. 
Leslie L. Amaciate Profesor of | Education, 
State College of Washington, Pullman, Washington. 
Sicheet B Coie Callngs Ratsinns Mescilantien Basel. 
Princeton, N. J. 


M. Prof of Educational Psychology, 
nent of Chet, Chica, 1 yp lll ve 
Edward Ea Cepeten, Chick 


partment,” 162 Fie rditnt Avenue, 


yond _ y W De 
s ar 
Larchmon 


t, New 


+ - Davis, Professor of Education, a of Bur- 
of Educational f Coloradof 


Research, University 


, 


Hari R. Director of om tion, Uni- 
pg ye gs iy jucation, Uni 


a cr a 
Beeps ky accra al 


MiService; Obie Sante Unheuly: Chae, Oe tie 


Col. 1 ie — Flanagen, Chief, Sib, Tnycholegical Dees. Of- 
fey een eeeee Air Forces, 
Wabegen 3s C. 


Cute V. Good 
of tdocntion tl Unireci of gee — 
aon: Onin 


Robert W. B. Jackson, Assistant Professor of Educational 
: Assistant | Director gm of Educa- 
tario College Education, Univer- 
wedi any hneadipes 
Harold E. 


¢ Roe of oP cofonnia: 


Ne tions, Universi of Callfornin Berkeley, 
‘uman A 

California. 
D. Welty Lefever, sags poy meg = = eet 
Southern California, Los Angeles, Californ 


Edward A. Lincoln, Consulti , Halifax, 
b taaieuae 


ing Lom, Associate Professor of Education, Teac 
Columbia University, New York City. 


A. R. Mead, Director of Educational Research, Univers | 
sty of Frida, 330 K. Yonge Building, Gainesville, 


T. E. Newland, Lt. Comdr., USNR, 2702 Wisconsin Ave- 
raga Die oo. lh D. ¢. 
Odell, fesoctate Septemmr of Education, University ~ 
“7 ‘Minois, Utena, Lilinots ” 
Willard C. ee, Professor of Education, Director of Re 
search in Child Development, University of Michigan, ~ 
Ann ‘Arbor, Michigan. 4 
W. E. Peik, Dean and Professor of Education, Univer 
sity of Minnesota, Minneapolis, Minnesota. a 
S. L. Pressey, Professor of Educational Psychology, Ohio 
State University, Columbus, Ohio. ‘a 
Clarence E. Ragsdale, Peeieess of Mieeaticn, University 
of Wisconsin, Madison, Wisconsin : 
William Reitz, Assistant Professor of ae 
of Examiner, Chief Statistician Air Cargo 
Research, Wayne University, Detroit 2, Michigan. 3 
Henry D. Rinsland, Professor of Education and Director 
of Educational Reseagch, ae ee SOuhoon, E 
Norman, Oklahoma. ea 
Robert T. Rock, Jr., Professor of 
Dept. of Psychology, eee 


versity, New York City. 


Hicioh Themgoen, Ciists of CORS Deedigeuns, Besant 
Associate, Yale University, New Haven, Connecticut. 
LS nag Pee ng, Pei my & Education, 
Teachers College, University, New York City. 
Herbert A. Ti Professor of Psychology, Ohio State 
University, Ohio. 


iis 5 2 ee. See ee oe 
University, Syracuse 10, New York. 


Ernest R. Professor of Education, New York Uni- 
versity, Ney ork City. 
of Educational 


D. A. Worcester, Chairman Department 
and Measurement, University of Nebraska, 
Lincoln, Nebraska. 


DEMOCRAT PRINTING COMPANY 
MADISON, WISCONSIN 
7 











Journal of Experimental Education 








Volume XIV 


DECEMBER, 1945 


Number 2 








_———— 


A STUDY OF THE RELATIONSHIP BETWEEN TEACHING 
PROCEDURES AND EDUCATIONAL OUTCOMES 


C. D. JAYNE 
University of Wyoming 


SECTION I 
STATEMENT OF THE PROBLEM 


The investigators were convinced from the 
preceding studies that teaching ability might, 
with improved instruments, be measured with 
a fair degree of accuracy. It is the purpose of 
this study to seek the relationship between 
observable teacher activities of the more 
specific sort and the changes produced in 
pupils as measured by tests. 


SIGNIFICANCE OF THE PROBLEM 


This study is of significance both in the 
field of teacher evaluation and teacher im- 
provement, largely for the light it should 
throw on the relative emphasis which should 
be placed in teacher evaluation and teacher 
education upon simple, observable teacher 
activities, as compared to more complex pat- 
terns of teaching. If it can be shown that 
there is a significant correlation, either posi- 
tive or negative, between specific activities 
and pupil gain, these activities become of pri- 
mary importance in teacher evaluation and in 
training programs; if the relation is insignifi- 
cant, then the evaluation of and training in 
specific activities as ordinarily pursued may 
appear less important. 


DEFINITION OF TERMS 


As used in this study the term “teaching 
procedures” refers to relatively simple, dis- 
tinct, observable teacher activities rather than 
to complex patterns of teaching into which 
these separate activities blend to form a total 
learning-teaching situation. We are thus here 
concerned with what might be called “pri- 
mary” teacher activities such as questioning, 
commenting, explaining, illustrating, etc. Two 
aspects of these teaching procedures are to be 
investigated: first their qualitative aspects— 
that is, the kinds of questions, comments, etc., 


used by the teacher, and second, their quan- 
titative aspects—that is, how frequently each 
is used. 


GENERAL PLAN OF THE STUDY 


The research here reported consists of two 
studies, one involving the teaching procedures 
of the teachers already described in Rostker’s 
study and for whom a criterion was devel- 
oped, and a second group involving ten 
teachers employed to check the findings of 
the first study. To get as precise a record as 
possible of the teaching procedures, sound 
recordings were made of each teacher’s teach- 
ing and these recordings analyzed for proce- 
dures. Following this a comparison was made 
of the teaching procedures of teachers of dif- 


ferent levels of efficiency. The general design 
and procedures employed in the first investi- 
gation are described first, after which the 
design and procedures of the second investi- 
gation are described. 


SECTION II 


THE DESIGN OF THE FIRST 
INVESTIGATION 


THE CRITERIA OF TEACHING ABILITY 


The criterion of teaching efficiency used in 
the first of the two investigations herein re- 
ported is based on measurable pupil change 
produced by the teacher as described by 
Rostker.’ Information regarding the teachers 
participating in the study is given in Table I 
of his study, p. 10; information concerning 
the size of the classes taught by these teachers 
is given in Table II, p. 10; information rela- 
tive to the types of schools participating in 


1 The criteria of teaching ability used in this study were 
worked out by Mr. L. E. Rostker. See “Measurement of 
Teaching Ability: Study Number One” Journal of 
mental Education, XIV (September, 1945), pp. 6-12; 22-39. 


Ior. 





JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


TABLE I 


INFORMATION REGARDING LENGTH OF SAMPLES OF TEACHING AND FREQUENCY OF 
OCCURRENCE OF SELECTED ACTIVITIES 


Minutes Minutes Total 
Recorded Recorded Minutes 
of first of sec- Recorded 
ond unit 
23.7 
30.0 


Teacher 
No. 


wr: 
oS: 
oo 


BEES 


' ON: NWrenrnwnwr 
1 HAS! NoMramwos ¢ 
1 woo NOCrFKOS: ONAS 


398.7 
23.5 


WD DOWD WOONONA DOORNNHAIAH HOW 


the study is given in Table III, p. 11; anda 
summary of information regarding the meas- 
uring instruments used is given pp. 12-22. 


DEVELOPMENT OF CRITERIA SCORES 


Rostker has summarized the steps by which 
the criterion scores were developed on pages 
22-38 of his study. His final composite score 
was employed as the criterion in this study. 


THE Sounp ReEcorps or Crass ACTIVITIES 


The analysis of teaching from sound rec- 
ords presents the same problems as the anal- 
ysis of teaching from direct observation except 
that in direct observation the activity can be 
observed but once: it happens and is gone, 
and the analysis must be based upon a 
memory of what happened. In making an 
analysis from sound records, the verbal hap- 
penings of the class can be reproduced as 
often as desired, and studied at leisure. 

There are certain factors of the teaching 
process which can be studied advantageously 
from listening to the sound record itself, as 
for instance, the speech factor. Aside from 
speech, it seemed that the analysis of teach- 
ing procedures could be made much more 


Number 
Ques- 
tions 


53 
202 
94 


1763. 
16.6 ~ 


Number 
of 
Com- 
ments 
24 
137 
88 
38 
217 
124 
69 


Number 
of 
Presenta- 
tions 

5 
20 
18 


Number 
teacher 
partici- 
pations 


Number 
pupil 
partici- 
pations 


71 


2650 
115.2 


readily and accurately from a typed transcrip- 
tion, than from listening to the record itself. 
Thus the records were transcribed and the 
analysis here presented based for the most 
part upon these typed transcriptions. 


ADEQUACY OF THE SAMPLE 


The adequacy of the samples of teaching 
upon which this investigation is based may be 
considered from two viewpoints: 


1. Were the teachers included in the study 
an adequate sample of the teaching pop- 
ulation? 

2. Were the samples secured of each teach- 
er’s work adequate? 


The first question has been studied and 
reported upon elsewhere in connection with 
the description of the teachers participating 
in the study. 

In considering the adequacy of the sam- 
pling of each teacher’s work the actual size 
of the sample of different types of teaching 
activities which have been studied needs to 
be known as well as the actual length of each 
sample in minutes of teaching. Table I gives 
this information. It will be noted that in 17 





December, 1945] PROCEDURES AND EDUCATIONAL OUTCOMES 


cases the sample consists of two recitations. 
In 6 cases the sample was a single recitation. 
The average length of the sample of each 
teacher's work was 41.2 minutes, and the 
range was from 21.8 minutes to 58.3 minutes. 
In four cases the sample was of less than 30 
minutes length. 

Since the UW4H criterion scores are based 
on changes produced in the pupils over a 
period of about 6 months, and the U criterion 
score on changes produced in 26 days of 
teaching, it is obvious that the adequacy of 
our sample needs to be considered in respect 
to each. Since the UWH criterion score was 
based on pupil growth in more than mere 
knowledge of subject matter taught in the 
civics class and included such abilities as that 
of organizing subject matter and ability to 
apply generalizations to social studies events, 
it must be assumed that the total pupil 
growth over the six months period cannot be 
ascribed to work in the civics class alone. 
Thus it is impossible to say that the-pupil- 
change in the UWH composite scOre came as 
the result of a certain number of periods of 
work. Probably as accurate a statement as can 
be made is that a portion of the change, prob- 
ably the largest part of it, came as a result 
of about 120 periods of work in civics. Since 
this study is related only to the teaching pro- 
cedure employed during the recitation period, 
our sample should be compared to the time 
spent in recitation during those 120 periods 
rather than to the total time. We have no evi- 
dence as to just what this time would be in 
each class, but it may be roughly estimated 
that perhaps half the time was spent in class 
discussions and half in reading and other 
activities. Assuming this to be true, it seems 
that about 60 civics recitation periods, or a 
period of about 2400 minutes of recitation, 
may have taken place over the UWH gain 
period. Our average sample of a little over 40 
minutes would thus represent about 1.7% of 
the recitation time during the UWH gain 
period. It is obvious that this is a very rough 
approximation and that the ratio would prob- 
ably vary considerably from teacher to 
teacher, but it does give some idea of the size 
of the sample compared to the total period 
involved. 

In relation to the U criterion score the 
situation is quite different. Since the U test 
measured gain from the two three-week units 
taught, and since these were taught for 26 


{ 


103 


periods, it is safe to assume most of the gain 
came from 26 periods of work. Again assum- 
ing that half the time was spent in discussion 
and half in other activities it would appear 
that about 13 periods or about 520 minutes 
were spent in class discussions. Of this we 
have samples averaging just over 40 minutes 
in length or a sample of about 7.7% of the 
total. 

Whether 1.7% of the teachers’ procedures, 
as in the case of the UWH criterion scores, 
and 7.7%, as in the case of the U criterion 
scores, are adequate samples of each teacher’s 
teaching would seem to depend upon the use 
to be made of the samples. If the samples 
were to be used as a study of teaching pat- 
terns of individual teachers they obviously 
would be inadequate. If one wished to study 
the teacher’s skill in making assignments, for 
example, the teaching sample would furnish 
only one or two instances of such patterns of 
activity, and since these patterns of activity 
are complex and extremely variable, there 
seems no question but that the samples would 
be entirely inadequate for this purpose. If the 
samples are to be used, however, to study 
specific teacher acts, the number of instances 
of these activities included in the sample 
becomes large as is shown by Table I. We 
have, thus, a sample of about 77 questions, 
61 comments, and 12 presentations, on the 
average, from the work of each teacher. 

While it is impossible to state dogmatically 
just what an adequate sample of teaching 
would be for the purposes of this study, two 
checks were set up to determine whether in- 
adequate sampling might have invalidated the 
results. First, the results obtained which were 
based on long time gain as measured by 
UWH scores were compared with results 
based on short time gains as measured by U 
scores. As pointed out above, the latter repre- 
sents a 4% times larger sample in relation to 
the time in which measured gains were pro- 
duced. The fact that the results from the 
application of both sets of criterion scores 
were essentially the same indicates that the 
sampling in one case was probably as ade- 
quate as in the other.” 

A second check to determine whether the 
results of the study might be influenced by 


*An analysis of those correlations shows that of the items 
from the activity correlated with the UWH criteria 
only 5.3% were statistically significant. When the same items 
— with the U criteria 5.3% were statistically 

ificant. 





104 


inadequate samples was made through a sec- 
ond investigation described in the next sec- 
tion. It merely needs to be noted here that 
in the second investigation the experiment 
was set up in such a way that a complete 
record of all discussion procedures over the 
experimental period which produced the cri- 
terion score was obtained. Thus, instead of a 
record of 1.7% or 7.7% of the total recita- 
tion procedures, a complete record of all 
teacher activities was obtained. When the re- 
sults of this second study are compared with 
the first it is to be seen that they are in 
essential agreement.’ It therefore seems that 
for the purposes of this study the samples 
have been of adequate size since large changes 
in the size of the sample made no significant 
difference in the results. 


TABLE II 


RELIABILITY OF ACTIVITY ANALYSIS MADE OF 
EACH OF TWELVE ACTIVITIES 


. Total number of teacher comments_- 

Time “‘yes”’ used 

- Total eiation teacher questions___-_----- ‘ 
Per cent of teacher talk 

Number class routine questions__--_----- ; 
Average words used in asking questions - . 

. Number of pupil participations --_- - --- - TT 
. Average length of pupil participation._-. . 

. Per cent prepared fact questions--_--- -- 6 
. Per cent recall specified fact questions... . 46 
|) eee — 
. Number unprepared fact questions__- -- -- .41 


NI SIPS Fo Ch ebb aenaaaeuae . 80 


1 
2. 
3 
4. 
5. 
6. 
7 
8 
9 


A further check on the adequacy of the 
samples of teaching was made by calculating 
the reliability of the index scores from twelve 
items of the check list. These items sampled 
each major division of the activity check list, 
and included activities with both high and low 
frequencies of occurrence. The reliability for 
each item was found by calculating the co- 
efficient of correlation between the series of 
16 activity scores, representing the score of 
each teacher on that activity from the first 
sample, and the corresponding series of activ- 
ity index scores from the second sample. 
After these were calculated for half the 
sample, the Spearman—Brown prophecy for- 
mula was used to calculate the reliability of 


*That the two one state axe fe ee apmeets 6 80 Be 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


the total sample. The reliabilities so found 
for the 12 items are shown in Table II. 

As indicated in Table II the reliabilities 
found ranged from .41 to .95 with a median 
value of .80. These reliabilities compare 
favorably with the reliability of tests com- 
monly used for group analyses. 


SECTION III 


THE DESIGN OF THE SECOND 
INVESTIGATION 


To check the findings of the first investi- 
gation and to provide additional data bearing 
on the relation of teaching procedures to pupil 
gain, a second investigation was conducted in 
April, 1940. 

This second investigation provided for an 
intensive study of ten classes, each taught by 
a different teacher under as nearly identical 
conditions as could be experimentally set up. 
The lesson taught was a unit on Alaska and 
the results were measured by a pretest and a 
final test. This provided a situation in which 
the investigator was able to obtain a very 
complete record of the work of the period of 
one recitation. 


The pupil population for this second inves- 
tigation was drawn for convenience from the 
sixth grade and junior high school of the 
Central State Teachers College Training 
School in Stevens Point, Wisconsin. In all, 
105 pupils participated in the study, but due 
to some loss in the equating of groups and to 
absences, the results of the study are based 
upon 95 pupil cases. 

The pupils were organized into ten classes 
equated on the basis of (1) grade placement, 
(2) score on Alaska pretest, (3) I.Q., and 
(4) reading test score. 

The test on Alaska, used as pretest, final 
test, and delayed recall test, comprised 127 
items devised by the author. 

The I. Q. scores used in equating the 
classes were secured from the permanent rec- 
ords in the Training School files with the 
exception of those for 30 students who had 
entered the school since the last mental test 
had been administered to their class. The test 
regularly used in the Training School was the 
Henmon—Nelson Test of Mental Ability.‘ 


ee ee eee See wy oS ee, 
Director of the Training School 





December, 1945] PROCEDURES AND EDUCATIONAL OUTCOMES 


Those students who had no I. Q. score on file 
were given the Terman Group Test of Mental 
Ability for Grades Seven to Twelve.® 

The reading scores used in this test were 
likewise taken from reading tests given in the 
Sixth Grade and Junior High School in Jan- 
uary and February of 1940 as a part of the 
school’s regular testing program. The test 
used in all grades was the Pressey Diagnostic 
Reading Test. 


105 


were not as well equated as might be wished. 
Only in the case of grade level are the sigmas 
of the best and poorest classes within one 
standard deviation. In the Alaska test the 
range is from 5.4 + 1.2 in class 8, to 11.6 + 
2.6 in class 5; and in I. Q. the range is even 
greater, for the sigma of the most uniform 
class, number 6, was but 9.3 + 2.1 while 
that of the most diverse class, number 8, was 
20.2 + 4.5. 


TABLE III 


MEANS AND SIGMAS FoR EACH OF TEN E%PERIMENTAL CLASSES ON EACH OF THE FOUR 
Factors USED IN S°QUATING OF CLASSES 


Grade Level 


ClassNo. Mean Sigma Mean 


81.4+3.3 
82.7 +2.2 
82.2 =2.9 
82.7 +3.4 
83.6 +3.7 
82.0 +1.9 
81.7 +2.8 
83.8 =1.7 
82.0 +3.1 
82.9 +3.2 


t 
> 


CSCC OAM OS Whe 
rd tad Wag tat tay ef eh PES 
Go on im tn im bo bo bo tot 
the ehh t & HE 
to bo bo bo bo bo im bo im 


— 


Study of Table III will 
classes were equated so that the differences in 
the class means were less than one standard 
deviation in relation to Grade Level, the 
Alaska test, and I. Q. There was a slightly 
greater spread in the reading means as com- 
pared to their standard deviations and two of 
the highest means (classes 6 and 7) with means 
of 7.7 and 7.8 respectively, fall just beyond 
one standard deviation from the lowest mean 
which was that of classes 8 and 10 with a 
mean of 7.2 and a standard deviation of .4. 
Since the spread in reading ability between 
the best class and the poorest was a matter 
of only six months, and since an analysis of 
class gains showed that the class with the 
highest reading mean ranked sth while one 
of the classes with the lowest mean ranked 
3rd, it would seem that so far as the means 
were concerned the classes were equated 
closely enough so that differences in class 
gains cannot be ascribed to differences in 
mean reading ability. It is generally recog- 
nized that to be comparable, classes need not 
only possess equal ability as expressed by 
means, but they should also have an equal 
spread of ability. In this respect our groups 

by Mr. Burton ‘Pierce, Principal of 


*This test was 
the Junior High 


Alaska Tost 
Sigma 
9.3 2.1 


I. Q. Reading “‘G”’ Score 
Sigma Mean Sigma 
17.9 4.5 
19.5 =4.3 
15.2 =3.4 
17.9 4.5 
9.4+2.1 
9.3 =2.1 
14.3 =3.4 
20.2 4.5 
15.2 +3.4 
16.6 +3.7 


Mean 


111.5 +6.4 
108.5 +6.3 
108.5 +4.9 
110.3 +6.4 
109.6 3.0 
109.6 +3.0 
111.3 +4.7 
109.9 +6.4 
110.8 =4.8 
107.4 +5.3 


It may be said that the procedure used in 
equating the groups will largely explain the 
fact that the classes were much better equated 
in regard to means than to sigmas. The equat- 
ing was done before the classes were taught 
so as to preserve as large a pupil population 
as possible. Some of the test scores used in 
equating were not obtainable until within a 
few days before the lessons were scheduled to 
be taught. The author started organizing the 
groups giving major attention to the means. 
This took longer than was expected and it 
was possible to give only incidental consid- 
eration to class spread. 

In order to determine whether the variation 
in spread of ability influenced the results of 
the experiment, the classes were ranked in 
order of their sigmas in each of the equating 
factors. These rankings were correlated by the 
Spearman formula p = 1 i 

sme N (N?—1) 
with the rank in mean pupil gain. The results 
are given below: 


There seems to be little relationship be- 
tween the range of ability within the classes 





106 JOURNAL OF EXPERIMENTAL EDUCATION 


so far as the Alaska test and I. Q. are con- 
cerned. Such relationship as there is would 
be in favor of homogeneous groups. The rela- 
tively high positive correlation in the case of 
reading would seem to indicate that the 
classes with the greatest range in ability made 
the greatest gains. Since this is quite contrary 
to what. should be expected, a further check 
was made of this matter. For a large number 
of test items not mentioned in any way in the 
class but discussed in the text, it was found 
that the rank of the classes in gain made on 
items they had only read about was correlated 
with the rank in size of the sigma from the 
reading scores to the extent of r — .3. This 
relationship based entirely on gains made 
from reading is much more significant than 
the relationship based on gains made from all 
procedures. 

There is thus probably nothing in the data 
available to show that there was anything in 
either the mean ability, or spread of ability 
within a class, which should have influenced 
significantly the outcome of the experiment. 

The teachers taking part in the investiga- 
tion were carefully selected with the purpose 
of securing as wide a range of teaching ability 
as possible. Table IV gives information re- 
garding the teachers used in the study. 

It will be observed that one teacher was a 
member of the college faculty, teaching 
courses in geography and in methods in this 
field who had until recently been a critic 


[Vol. 14, No.2 


teacher in the social science field. A second 
teacher was a critic teacher supervising prac- 
tice in geography in the Teacher’s College 
Training School. Two teachers were public 
school teachers in Stevens Point public schools 
and both were ranked high by their superin- 
tendent. Two were student teachers in the 
Training School who had several years of 
teaching experience, in rura] and state graded 
schools. Two were inexperienced student 
teachers but were rated as strong by their 
critic, and two were inexperienced teachers 
ranked by their critic as average or below, as 
compared to other beginning practice teach- 
ers. The range in teaching ability was thus 
thought to be considerable. 


The subject matter for the experimental 
lesson was selected rather arbitrarily. It was 
desired to have a unit of work small enough 
to be covered in a single class period, and a 
unit on which there was plenty of material 
available, but one which had not been studied 
recently by any of the pupils taking part in 
the study. The topic on Alaska seemed to 
meet these requirements. 


Since teachers were to be measured in terms 
of pupil gain it was felt that objectives should 
be carefully defined and drawn up in a man- 
ner to define the teacher’s task. The measur- 
ing instruments used attempted to measure 
pupil achievement as related to these objec- 
tives. 


TABLE IV 
INFORMATION REGARDING PARTICIPATING TEACHERS * 
Teacher April 1940 E 
No. Teaching Position General Rating 
| ere Principal and grade teacherin Ward Rated by Supt. as being a very good teacher. Has 
School, Stevens Point. degree from Teachers College and has done grad- 
uate work at the University. 
«SET. ee) and grade teacher in Ward Rated by Superintendent as an outstanding teacher. 
chool. 
— Student Teacher Rated by Critic as strong practice teacher. Practice 
de A. Has several years of teaching experience 
in rural schools. 
ee Student Teacher Rated by Critic as weak. Practice grade first sem- 
ester B. No experience. 
eee Student Teacher Rated by Critic as strong. Practice grade A. No ex- 
perience. 
ae Critic Teacher Second year as critic teacher in Teachers College 
= geography teaching in Junior High 
. Student Teacher Practice Grade first semester A. No experience. 
. ae Student Teacher Practice Grade first semester A. No experience. 
thet noe Student Teacher Practice Grade first semester B. Had 2 years of 
A: teaching experience. 
%.}.--...- College Instructor in Geography Pemeatie ¢ a critic teacher supervising geography on 


Junior High level. 








Deces 


Th 
ing St 
of all 
classé 
a cop 

"F 
infor: 


, 


geog! 
textb 





December, 1945] PROCEDURES AND EDUCATIONAL OUTCOMES 


The writer, therefore, prepared the follow- 
ing statement of objectives, and at a meeting 
of all teachers a week before the experimental 
classes were to be taught, gave each teacher 
a copy and discussed it with them: 

1. To insure pupil mastery of the factual 
information given in the textbook. 

2. To insure pupil understanding of the 
geographical relationships discussed in the 
textbook. 

3. To develop in the pupils the ability to 
locate on an outline map the following: 


Ketchikan 
Juneau 
Cordove 
Seward 
Anchorage 
Fairbanks 


Yukon River 
Aleutian Islands 
Bering Sea 
Bering Strait 
Arctic Ocean 
Kodiak Island 


4. To promote pupil initiative in raising 
questions, volunteering pertinent information, 
etc. 


5. To work toward the establishment of 
correct oral expression on the part of the 
pupils. 

6. To develop an interest in knowing more 
about Alaska. 

The chief measuring instrument used in this 
investigation was a test on Alaska prepared 
by the author. The first section of the test 
covered the place locations that were listed 
in the teaching objectives. The remainder of 
the test, which had 127 items in all, was de- 
voted to factual information and relationships 
given in the textbook. Thus the test actually 
covered the first three of the teaching objec- 
tives set up. The reliability of the test, as a 
pretest was found to be .84, as a final test .79, 
and as a delayed recall test .77. 

It was felt that objectives 4 and 5 could 
best be evaluated from the transcription of 
the lesson itself, but there was actually so 
little evidence of work toward these objectives 
in any of the classes that no further study 
was made of them. 

A questionnaire study was made in an 
attempt to get some measure of the interest in 
Alaska as stimulated by each teacher. 

The Alaska pretest was given a week before 
the experimental classes were taught so as to 
use the test results in the equating of the 
classes. The classes were taught on a Friday 
and the Alaska test was repeated the follow- 
ing Monday to the entire group of pupils. 


107 


The same test was given a third time five 
weeks after the teaching of the lesson as a 
measure of retention. 


It will be noted that the testing program 
was so conducted that pupils from all classes 
were tested together as a group in each of the 
three applications of the test. The first test 
was given without previous announcement, 
with no explanation of its purpose and with 
no indication that there would be any follow- 
up work of any kind. In the teaching of the 
classes no mention was made of the fact that 
there was to be a second test given. The test 
on the Monday following the teaching came 
as a surprise to the pupils. Thus the possi- 
bility that some ambitious pupils might study 
Alaska over the weekend in order to make a 
high grade was ruled out. The questionnaire 
filled out by pupils on the Tuesday after the 
lessons were taught indicate that some of 
them did talk to their parents about Alaska, 
and some checked on one or two points, but 
there was no evidence that anyone made any 
systematic effort to gather more information. 
It is probably safe to say that whatever 
learning took place over the weekend was due ~ 
to the interest created by the teacher and by 


the nature of the experiment. 


The third application of the test was not 
announced previously, nor was Alaska dis- 
cussed in any of the classes during the period 
between the giving of the second and third 
tests. 


RECORDING THE LESSONS 


The lessons were all taught on Friday, 
April 12, 1940. The College studio was used 
as a classroom because of its advantages for 
the making of sound records of the recita- 
tions. The pupils were seated on two rows of 
chairs near the front of the room. A black- 
board, wall maps and globe were conveniently 
placed. Back of the pupils were tables where 
observers were seated to observe the class and 
to make notes. The microphone was placed 
between the teacher’s desk and the pupils. 


Except for the microphone and the knowl- 
edge that a record was being made, the teach- 
ing situation was not unusual for the pupils 
taking part. It will be remembered that they 
were all pupils from the college Training, 
School, and new teachers, new procedures, and 
visitors in the classroom taking notes, were 
common experiences for them. 





108 


An effort was made by use of several items 
in a questionnaire to find out how the pupils 
felt about the naturalness of the teaching 
situation. Twenty-nine admitted they were a 
little nervous at the start of the lesson, while 
sixty-five claimed they were not. Only eleven 
said they felt nervous most of the time. 
Seventy-six of the pupils, or more than 75%, 
thought that they participated in the class 
just about as usual. 

Each class consisted of an eight minute 
introduction in which the teacher might moti- 
vate her work and make her assignment, or 
use otherwise as she saw fit. After the intro- 
duction, which was all recorded, the class 
went to an adjoining study room where refer- 
ences, wall maps, and a globe were available. 
The study room was in charge of the same 
individual for all ten classes and insofar as 
possible the study conditions were kept identi- 
cal for all classes. No help was given to pupils 
during this period other than to point out the 
materials available. The assistant in charge 
reported that every pupil worked industri- 
ously every period. The condition of the ex- 
periment provided a motivating force which 
caused every pupil to do his best. 

After twenty minutes of study the pupils 
returned to the studio and a recitation period 
was conducted in any way the teacher wished 
to proceed. This recitation period lasted for 
twenty minutes and was all recorded. 

The records made under the conditions of 
this study were definitely superior to those 
made in the first investigation. These records 
were found to be 97.7 percent transcribable— 
that is only 2.3 words per hundred were not 
understandable. 

The results by classes for the pretest, 
immediate recall test, delayed recall test, and 
the immediate and retained pupil gains are 
shown in Table V. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


Since the classes as organized had been 
equated so that we can assume them to have 
been of about equal ability to learn, since all 
teachers were striving for the same objectives, 
and all classes were taught under as nearly 
identical conditions as it was possible to 
secure, then the success of the teacher can be 
measured by the extent to which the objec- 
tives of the lesson were accomplished. The 
Alaska test was designed to measure pupil 
gain in the direction of those objectives and 
hence we are justified, insofar as this test 
is a valid measure, in taking pupil gain from 
the Alaska test as our criterion of teaching 
success. The study provides two sets of gain 
scores—one based on immediate recall, and 
one based on delayed recall. It might be 
argued that the retention of material over a 
five week period is a better criterion of teach- 
ing ability than is the immediate recall. While 
admitting that this is probably true, it was 
also recognized that there was a greater 
opportunity for uncontrolled factors to affect 
the results of the delayed recall gains. It was 
decided therefore to use both sets of scores 
and to compare teacher activities with each 
set of criterion scores. 


DIFFERENCES IN TEACHING ABILITY 


Whether or not the difference in class gains 
as indicated in Table V really indicate any 
difference in teaching ability depends on two 
factors. First, were the pupil gains large 
enough to be statistically significant? Second, 
were the differences in gain between classes 
statistically significant? In other words, how 
certain are we that the differences observed 
in mean pupil gain are not chance differences, 
and not due to difference in teaching ability 
at all. 


TABLE V 


MEAN SCORES AND SIGMA OF MEANS For ALASKA TEST 


Immediate 

Pretest recall 
81.4—3.3 92.7—2.5 
82.7—2.2 ° 97.3—4.2 
.2—2.9 92.3—3.5 
.1—3.4 91.7—6.9 
.6—3.0 98. 8—2.3 
.0—1.9 95.5—2.8 
. 72.8 94.3—2.5 
.8—1.7 98.1—2.4 
82.0—3.1 95.6—3.4 
.9—3.2 92.0—2.9 


Delayed Immediate 





December, 1945] PROCEDURES AND EDUCATIONAL OUTCOMES 


The sigma of the difference between the 
means of the pretests and the means of the 


109 


changes produced in the pupils between the 


applications of the test, rather than upon the 


final tests was found by use of the formula’ element of chance. 





+ (Ma—Mb) = 1/N 9/3 (0— ar BOOP 


The critical ratio technique was then em- 
ployed to determine whether the difference in 
means was statistically significant. Table VI 
shows the results. Since a critical ratio of 
three or more indicates the probable existence 
of a true difference and since all of the criti- 
cal ratios in this table are well above that, 
jit can be said that the pupil gains are statis- 
tically significant, that they grow out of 


¢A. S. Barr and C. N. Mills, “A Short Method of Calcu- 
lating the Standard Error of the Difference of the Means of 
Paired i Journal of Experimental Education (March, 
1937), p. 313. 


N 


The statistical difference between the mean 
gains made by different classes was investi- 
gated by the same procedure except that the 
formula Om, — Ome = Vow,” oe Ome” was 
used, since the fwo series of scores were inde- 
pendent of each other. Since a general knowl- 
edge of the significance of these differences is 
required, rather than an exact statement of 
the difference between each class and each of 
the other ten, we have calculated the differ- 
ences between class IV, which had the lowest 


TABLE VI 


INFORMATION CONCERNING STATISTICAL SIGNIFICANCE OF PUPIL GAIN BETWEEN Two 
APPLICATIONS OF ALASKA 


M for T? 


6 Diff. M of 
M for T* T'* and T? 
81.4 
82.7 
82.2 
82. 
83. 


oonwmscoF-) 
SCQrKwaoe-I0Ce=) 
H+ BORO ND PO CO ps bt pa 
PAAMNAVBABAH O 
COCO FOF OMOD Ww P.} 


TABLE VII* 


SIGNIFICANCE OF DIFFERENCES BETWEEN THE IMMEDIATE GAIN MADE BY CLass IV, WHICH 
RANKED LOWEST, AND EACH OF THE OTHER CLASSES IN THE SECOND STUDY 


Differences Between 
Mean of Class 


The Different Classes Com- 
pared with Class IV 


6. 


IV and Each of 
Other Class Means 


Sigma of 
Difference 


Critical 
Ratio 


% 
> 


> popopopo po fore 
SCOInannn wr 


* Editor’s Note: It is important for what follows that the differences between class means 
be statistically significant. There are two ways of checking the differences: (1) as the author 
has done; and (2) by noting the differences from class to class. The class to class differences 
are: Iv-x, 1; X-IIl, 1.0; III-I, 1.2; I-VI, 1.3; VII-VI, .9; VI-IX, .1; IX-Il, .0; II-VI, .7; 
VIII-V, .9; with of the differences as follows: 2.38; 2.46; 2.42; 2.78; 3.01; 2.92; 2.80; 
2.96; 4.38. These erences are not statistically significant. 





110 


immediate recall gain, and each of the others. 
Table VII shows the results. 

As will be observed from an examination 
of the critical ratios given in Table VII the 
differences in mean pupil gain between the ten 
classes are not large enough to be statistically 
significant—that is, it is not certain that the 
differences observed were due to differences 
in teaching rather than to the element of 
chance in the selection of samples. The differ- 
ence between Class IV, which made the lowest 
mean gain, and Class X, which was next to 
the lowest, is so small that if the experiment 
were repeated the chances are almost equal 
that their relative positions would be re- 
versed. It can be said, however, that on the 
whole, the differences between the mean gains 
of the classes, while not as large as should be 
desired for this study, are consistent. 

The rating of the ten teachers on the basis 
of pupil gain produced some surprising results 
as indicated by Table VIII. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


for some reason the criteria scores do not rep- 
resent true teaching ability? Does not the 
result of the ranking on the basis of criteria 
scores based on pupil gain, as compared with 
the experience, training, and recognized pro- 
fessional standing of the teachers, raise a 
question as to whether the results of the in- 
vestigation can be accepted as reliable? 


It seems that a number of explanations may 
be offered which may or may not adequately 
explain the above situation. First, it should 
be recognized that the criterion scores from 
this study cannot be interpreted as being gen- 
eral indices of teaching ability. At best they 
represent ability to teach a certain lesson, 
under certain prescribed conditions, with the 
objectives limited largely to textbook informa- 
tion. It may be that the student teachers, 
used to teaching according to general direc- 
tions set up by their critics, found it easier 
to adjust their teaching to the experimental 
situation. Second, it should be pointed out 


TABLE VIII 
RANKING OF TEACHERS ON BASIS OF PUPIL GAIN 


Teacher Rank 


call 
1 
2 
3 
4 
5 
6 
7 
8 
9 
0 


SOO DO OO mH PD 


~ 
— 


As was pointed out previously, teachers 
were selected with the purpose of getting as 
great a range of teaching ability as possible. 
As the evidence shows those teachers selected 
because of their professional reputation as 
successful teachers of geography did not rank 
as high by the criteria scores as some inex- 
perienced student teachers. 

This fact raises some questions pertinent 
to this investigation. First, is it possible that 
teacher number V, an inexperienced student 
teacher, is actually a better teacher than 
Teacher X, who has become a specialist in 
the field of geography teaching and has won 
recognition throughout the state for her pro- 
fessional work? If this is so unreasonable as 
to be impossible to accept, does it mean that 


Data Concerning Teacher 
Inexperienced student teacher 


Inexperienced student teacher 
Principal ward school 


jxncan dena Student teacher with experience 


Critic teacher 

Inexperienced student teacher 
Principal ward school 

Student teacher with experience 
College teacher of geography 
_.Inexperienced student teacher 


that the experimental situation itself provided 
a motivating force that called for the best 
efforts of all of the pupils regardless of what 
the teacher did. The presence of this extra- 
neous motivating force may have greatly min- 
imized factors of teacher personality, teacher 
forcefulness, teacher ability to arouse and 
hold interest, etc. If this is true, the less 
mature and inexperienced teachers would 
probably have benefitted by the situation. 
Third, the more experienced teachers, selected 
for professional competence, may have found 
it more difficult to limit themselves to the 
objectives set up dealing largely with mastery 
of facts from the text. Fourth, it is the 
writer’s impression, based on informal confer- 
ences with the teachers, that the student 





i ee 


ee 


- OS DO hee 


December, 1945] PROCEDURES AND EDUCATIONAL OUTCOMES 111 


teachers on the whole, probably spent more 
time in planning and preparing their lesson 
than did some of the more experienced 
teachers. 

Two other sets of teacher ratings are avail- 
able which as a whole increase confidence in 
the criteria rankings. The first is the ranking 
based on pupil opinion of how well their 
teacher taught as indicated on a ten point 
scale, and the second is the ranking given by 
Harris on pupil gain as scored by the Barr- 
Harris Performance Record. Table IX com- 
pares these rankings with the ranking by the 
criterion scores. 

When one compares these sets of rankings 
and takes into account the prestige factor 
which doubtless influenced both the pupils 
and Harris in giving higher rating to teachers 
of recognized professional standing, the crite- 
rion scores seem reasonable. 


describing various observable teacher-pupil 
activities. The items included in this check 
list were found in the literature dealing with 
previous investigations of teaching procedures, 
and in books on techniques of teaching. Other 
items occurred to the writer as a result of 
observing teachers at work and these were 
added to the list. This list was in no way an 
evaluated list; any observable teacher activity 
about which the verbal record of the recita- 
tion gave information was included in the list. 

For the sake of convenience this check list 
was organized into sections, each dealing with 
a particular type of teacher activity. Thus 
one section was devoted to “questions” an- 
other to “comments”, etc. 

With this check list at hand the next step 
was to determine the frequency with which 
each activity occurred in each of the thirty- 
nine samples of teaching taken in investiga- 


TABLE IX 


COMPARISON OF RANKS BASED UPON PUPIL OPINION OF TEACHER 
EFFICIENCY AND PUPIL GAIN 


Rank 
immediate 


rf 
= 


1 
3 
2 
4 
5 
6 
7 
8 
9 
0 


SECTION IV 


THE ACTIVITY ANALYSIS OF 
TEACHING PROCEDURES 


As stated previously this study has as its 
chief purpose the comparison of definite, 
observable teacher acts, and the results of 
teaching. Thus far we have described the 
experimental pattern, the derivation of cri- 
teria for evaluating the results of teaching, 
and the samples of teaching procedures 
secured for this analysis. We are now ready 
to discuss the analysis made of the procedures 
found in the samples of teaching taken, and 
their comparison with the several criteria of 
teaching efficiencies employed in this study. 

The first step in the analysis was the 
preparation of a check list of some 184 items 


Rank Rank by Rank by 
delayed pupil Harris 

recall rating (total 

test score) 


Rank by 
Harris 


SCOANIHAWO PTD 
os 
Cr NwWAO m or-300 & 


~ 


st 
CoA OO OOD OO m 1D 


tion number one, and in each of the ten 
samples taken in investigation number two. 


Previous research has demonstrated that 
the reliability of an activity analysis is in- 
creased when the number of items studied is 
small.” It was decided, therefore, to go 
through the mimeographed copies of the les- 
sons a number of times, each time concentrat- 
ing on but a single type of activity. Thus the 
mimeographed material was first gone through 
merely noting the number of teacher-pupil 
participations and the length of each. Next it 
was gone through checking only for English 
errors, another time checking only teacher 
comments and their classifications. The sec- 
tion of the activity list devoted to “questions” 


‘A. S. Barr and others, Supervision (N York: D. 
Appleton—Century Co., 1938), p. 416. _ 





II2 


was so large that the material was gone 
through twice, once checking only questions 
classified as “fact” “thought” “local” “pre- 
pared” and “unprepared” and a second time 
to check the additional classifications. 

The labor involved in determining the reli- 
ability of the analysis for each item in the 
list would have been so great that only cer- 
tain items which further study indicated were 
of significance to the investigation were 
selected for study. The reliability of the fre- 
quency count made of these items was deter- 
mined by repeating on a fresh copy of the 
mimeographed lesson, the analysis previously 
made. For purposes of checking the reliability 
of items only lessons in the second investiga- 
tion were used. Pairs of scores representing 
frequency of occurrence as found by each of 
the two counts were thus secured for each 
item in each lesson. Correlations were then 
run between the first series of frequencies and 
the second using the formula® 


=S.=R 
N 


(3S)? » (3R)? 
/ ae" | [se 3 | 


3S .R— 


7 == 








The resulting coefficient may be considered as 
the coefficient of objectivity. 

Table X shows the coefficients of objec- 
tivity of certain selected items. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


Due to the difference in the length of the 
samples, varying as they did from 13.9 
minutes to 34.8 minutes, the frequencies with 
which activities occurred were meaningless 
until reduced to an index number which 
would eliminate the difference in the time 
element. To accomplish this, all frequencies 
were divided by the length of the lesson in 
minutes. 

As the material was examined at this point 
the possibility presented itself that the ratio 
between the frequency of a certain type of 
question or comment used, and the total num- 
ber of questions or comments, might be a sig- 
nificant index. Thus a second set of scores 
were worked out showing what per cent the 
frequency of each separate item was for the 
category of which it was a part. Thus an 
index score of 48 indicates that 48% of all 
questions asked were fact questions. 

After carrying the analysis to this point it 
was discovered that a large number of items 
included in the original check list would have 
to be dropped because they did not occur at 
all or their frequency of occurrence was so 
small as to make their further study fruitless. 

After securing index scores indicating the 
type and frequency of a large number of 
specific teacher-pupil activities, the next step 
was to compare these by means of a correla- 
tion technique to each of the criteria scores 
based on pupil gain. Such a procedure indi- 
cates the extent of the relationship between 


TABLE X 
RELIABILITY OF ACTIVITY ANALYSIS OF SELECTED ITEMS 


IN on  adinausennwnenecedsecen LS ae oe aren eae Ree ee , 
Number of fact questions ; 
No acncundiasnbsasscawdcndbobnecckeesevesa : 
ee eed  cnawdids ible a iomeatine a duibduamniacen maewda dads avin ; 
a LE TL ie LATTES, Sree eee CT a: ie Vee ieee epee : 
i a a la cc chaaon Medd ol dbase wkwwscd mews ; 
RE eS ae ee sgt ate n Cash Uae Oe cuikacotpine nat . 
Recall of specified fact questions ---_-___-- ile tbelandtanalib AAD abhi debicacs ig GS dol okie tice ' 
cnn eee Ln eniebiapineapanhnenerdnhistuentinatieertaneksesepes . 
os cach cma neni danunt Ride nae umienin- asa ‘ 
REESE IES I EEE EE eS EE LE ae P 


After determining the frequency of occur- 
rence of each item in the check list, in each 
recitation, as described above, the next step 
was to transfer these data to large work 
sheets so that the frequency of occurrence of 
each item for each class could be seen at a 


the frequency of occurrence of each particular 
activity and the gains produced in the pupils. 

For determining the correlation between the 
various sets of index scores and the U and 
UWB criteria scores in the first investigation, 


and between the index scores and the imme- 
diate and delayed recall criteria scores of the 
second investigation, the Ayres formula was 


glance. 


® Leonard P. Ayres, Journal of Educational Research (March, 
1920), p. 216. 





% 


oe 


SBe re & 


December, 1945] PROCEDURES AND EDUCATIONAL OUTCOMES 133 


used. In determining the correlation between 
the series of index scores and the criterion 
scores derived from supervisory ratings in 
both the first and the second investigations 
dl used as 
N (N?—1) 
a matter of convenience. According to Soren- 
son the difference in results obtained by use 
of this formula and the Pearson formula will 
seldom vary over .02.° 


While usable sound records were made of 
twenty-three of the twenty-eight teachers in 
the first investigation, it was impossible to use 
this number in the correlation studies because 
of the absence of certain criterion scores. Rec- 
ords were made in three schools which failed 
to complete the final testing program. Four 
additional schools were dropped in the process 
of developing criterion scores as explained on 
page 34. This left a total of sixteen cases in 
which to apply the correlation technique be- 
tween the series of index scores obtained on 
the activity analysis and the U and UWH 
criterion scores. “Expert evaluation” rankings 
were secured on twenty-two teachers. Teacher 
number XVII was omitted from this part of 
the study because the lesson she taught had 
not been transcribed at the time the evalua- 
tions were made. Thus the correlations be- 
tween the series of index scores and the rank- 
ing of the teachers by “expert evaluation” are 
based on 22 cases. 


the formula p = 1 — 


All correlations in the second investigation 
were based on ten cases. In the application of 
the correlation technique it has been estab- 
lished that for short series, such as were used 
in this study, there is a tendency for the 
coefficient to be extraneously high, and for 
the standard error to be extraneously low. 
Formulas have been devised, however, to cor- 
rect this tendency.”® An analysis of the coeffi- 
cients obtained in this study, as will be made 
in the‘ next section, indicates however, that 

* Herbert Sorenson, Statistics , of fevebetens 
ont, Bi Education (New York: M 1 Book Co., 1936), 

See Albert E. Waugh, Elements of Statistical Method 


(New York: McGraw-Hill Book Co., 1938), _» 253, for a 
discussion of this point. The formulas given are 
(Y=1—U—") 24 
(where r’ denotes the corrected coefficient of correlation 
uncorrected coefficient of correlation and ng of 
cases in the series), and 
(S’,)? = (S,)* a 


(where S’, is the corrected standard error of estimate, S, the 
uncorrected and m the number of cases). 


most of the correlations were low and that 
very few of them were statistically significant. 
In fact a large part of the coefficients found 
were so small that they were reduced to zero, 
or had their sign. changed, by the application 
of the correction formula. The discussion 
which follows is concerned with but few ex- 
ceptions with those coefficients of correlation 
which remained greater than zero after cor- 
rection. Tables XII to XVIII have been pre- 
pared to show the corrected correlations and 
standard deviations for all such items. 


Before considering the relationship between 
specific teaching procedures and the criterion 
scores, it may be worthwhile to get a general 
impression of the relationships found between 
teaching procedures on the whole, as classified 
in the check list used, and pupil gain. An ex- 
amination of Table XI will show that a total 
of 336 correlations were found between 84 
teaching procedures and the four “gain” cri- 
teria scores, while 168 more were found be- 
tween the same procedures and the two “ex- 
pert ranking” criteria, making 504 coefficients 
of correlation in all. Of the 336 correlations 
with pupil gain criteria, 191 were found to be 
smaller than the correction required for the 
shortness of the series, which left 145 items** 
with correlations greater than zero after appli- 
cation of the correction formula. Of these, 40 
had correlations smaller than their sigmas, 
and 50 more had correlations less than twice 
their sigmas, which left only 55 coefficients, 
or slightly over 16% of the total, with coeffi- 
cients more than twice their standard error. 
To be statistically significant, that is to be 
relatively certain that the relationship indi- 
cated by the coefficient is not due to chance, 
a coefficient should be at least three times as 
large as its standard error. 


Inspection of the corrected correlations, 
(Tables XII to XVIII) shows that only 20, 
or about 8% of the total, had correlations 
large enough to be statistically significant. 


All of the procedures which were found to 
have a significant positive correlation with one 
or more pupil gain criterion scores are listed in 
Table XIX together with the correlations of 
each procedure with the other three criterion 
scores. The same procedure was followed in 
making Table XX, except that procedures 
were listed which had a significant negative 
correlation. In other words, Table XIX shows 

1 See Tables XII to XVIII. 





114 JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 14, No.2 


TABLE XI 


CORRELATIONS FOUND BETWEEN VARIOUS INDEX ScoRES OBTAINED FROM AN ANALYSIS oF 
TEACHING PROCEDURES AND VARIOUS CRITERIA SCORES OF TEACHING 


CORRELATIONS FOUND 
First Investigation Boocnd | Rawestiontion 
With Expert With immed. _— With Expert 
_ U we UWH ranking recall ranking 


or or Cr r Cr —s. r Cr 


= 
= 
4 
2 


IL. gn Scores 
. No. of words per minute__.—.11 -25 2 -24 .04 -21 : -25 . 31 .64 .19 

Hy Percentof Teacher Talk.. . -21 23... -19 —.25 ‘ . 13 -82 

: av k .34 -22 -23 é .19 .28 s .68 -17 .32 


-25 . -24 . ° —.29 ‘ ‘ 81 -09 
-15 -42 


oe d : : mm 2 4 ‘ aS —.u4 
o. 0. Feacher a. = d ; : . ‘53 d é .09 -15 


2 omer 


t. ¥ 
8. No. —— = 
9. Average wo! euine— 07 


SOSSP=s 


= aye been i tes ‘ e ‘ ° ‘ . -75 


fact 
No. expository questions _ 
: ~ cent expadaeny ques- 
ons 


SO gD lle al eg hg GE OE Ea EASES 


7. No. judgment questions _-- 
8. Per a judgment ques- 


Per cent ale 
tions 


. No. “yes-no” ” questions - 
Per cent “yes-no” ques- 
tions 


No. shert answer questions 

20. Percent short ans. quest... .48 
No. short ans. not discussed .15 
Per cent short ans. not 


29. Ques. ‘directed to last speak. 

30. Per cent questions directed 
to last 

31. No one called on—no one 
answe' 

$2. Per cent no one called on— 
no one answe' 

33. Prepared fact questions --- 

34. Per cent prepared fact. 


35. reed fact oe. 
36. Per cent unprepared fact 
juestions 


37. fact questions 
38. Per cent local fact ques. --- 
89. Prepared thought ques- 
tions 
40. 
41. 
ti 


42. 


43. 
44. Per cent local thought 
questions 





December,1945| PROCEDURES AND EDUCATIONAL OUTCOMES 


TABLE XI—Continued 


CORRELATIONS FOUND 
First Investigation Second Investigation 
With Expert Wan i immed. With ee With Expert 
With U With UWH ranking ranking 
Cr or r or — 5 r oes Ji r r or 


a 


IV. Teacher Comments 

Pupils praised é -19 i -21 .08 i .08 

’ Pupils eriticized__-___- ar : Ta ca ae ae 
Pupil en toca - 34 -22 d -23 = 


wo 
= 


31 -08 -81 
.32 46 -25 
q .52 -23 
Teacher ane answer in 
almost same words... ...._ 
Teacher repeated answer 

in different words__....._. 
Teacher summarized answer 
Teacher interprets answer -_ 


Sees & Bs: 


Ser: * at od 


wn 
oe | 


ess 


. Corrects misstatement... .—. 

. Asks more information-__- 

. Indicates ans. partly right 

. Corrects grammar 

. Indicates answer wrong. --. 

. Indicates answer right - - .- 

. Teacher gives answer 

. Teacher asks “why” 

. Suggests importance of 
contribution 


| tere 
a4 


—— 
wo 


<) 
~ 


V. Teacher Presentation 
Number of Times 
Teacher quotes material _- 
. Teacher gives factual 
information 
. Presents idea by presenta- 
tion of _ 
. Gives Personal Opinion _..— 
. Gives explanation 
. Gives illustration... --_- 
. Totaljno. presentations. - 


TABLE XII 


List oF ITEMS DEALING WITH TEACHER-PUPIL PARTICIPATION WHICH HAVE CORRELATIONS 
SIGNIFICANTLY DIFFERENT FROM ZERO AFTER APPLICATION OF CORRECTION FORMULA 


Immediate Delayed 
U UWH Recall Recall 
r r 

Per cent of teacher talk : , .20 : : _. —.72 
Per cent of pupil talk 2 ‘ —.18 ‘ ca Pack . 63 
Average length of teacher 

sastideniion euae ; : i inn ‘. . 69 
Number of teacher par- 

ticipations 24 ‘ . 26 . ‘ ‘ . 83 
Number of teacher questions Sikes oes . 20 


TABLE XIII 


List OF ITEMS DEALING WiTH KIND OF QUESTIONS ACCORDING TO “MENTAL PROCESS” 
CLASSIFICATION WHICH HAVE CORRELATIONS SIGNIFICANTLY DIFFERENT FROM 
ZERO AFTER APPLICATION OF CORRECTION FORMULA 


Immediate Delayed 
U UWH Recall Recall 
rT Cr r Cr Tr Cr 
Recall of s ouaiet facts . ‘ . 20 .24 a .15 .15 .29 
Per cent of recall of specified 
. 20 .24 ‘ . 20 


Number of expository .09 .24 ; ree 
Number of selective recalls__ ae . 38 . 20 : 

Per cent of selective recalls... —.28 ; oe apes 

Number of judgments 2 ; .42 . 20 





JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 14, No.2 


TABLE XIII—Continued 


UWH 
Cr 
Per cent of judgments 
Number of Seal and organ- 
izations 
Per cent of recalls and or- 
ganizations 
Number of class routines 
Per cent of class routines _ _ __ 
Number of rhetorical 
Per cent of rhetorical. _______ 


TABLE XIV 


List oF ITEMS DEALING WITH KIND OF QUESTION ACCORDING To “ForM” CLASSIFICATION 
WHICH HAVE CORRELATIONS SIGNIFICANTLY DIFFERENT FRoM ZERO 
AFTER APPLICATION OF CORRECTION FORMULA 


Immediate Delayed 
Recall Recall 
‘ Tr Cr r Cr 

Number of ‘‘ Yes—No”’ ae Pee — wee a Rm . 54 21 
Per cent of ‘‘Yes—No”’ ae eee pee - ney . 33 
Number of “‘ Yes—No” not 

followed by discussion 2 F P . 46 . 23 
Per cent of ‘ 

followed by discussi 


Per cent of short answers ____ 
Number of short answers not 


i Fae of short answers not 


PECEEE nce eescccaccesc=« 


TABLE XV 


List oF ITEMS DEALING WITH WAY IN WHICH PUPILS WERE CALLED ON WHICH HAVE 
CORRELATIONS SIGNIFICANTLY DIFFERENT FRoM ZERO AFTER 
APPLICATION OF CORRECTION FORMULA 


Immediate Delayed 
U UWH Recall Recall 
Cr Tr Cr 
Number of questions asked, 
pupil called Beate nee oie 
“— ae ¢ * eeagaag asked, ; . 
.24 


“Tae asked 

Per cent of pupils called on, 
question asked 

Number of questions directed 
to last s 


ne answ 

Per cent of “‘No one called on, 
no one answe 

Per cent of “‘No pupil called 
on—answered by volunteer” 





December, 1945] PROCEDURES AND EDUCATIONAL OUTCOMES 


TABLE XVI 


List OF ITEMS DEALING WITH “FACT-THOUGHT” CLASSIFICATION OF QUESTIONS WHICH HAVE 
CORRELATIONS SIGNIFICANTLY DIFFERENT FroM ZERO AFTER 
APPLICATION OF CORRECTION FORMULA 


Immediate Delayed 
U Recall Recall 
r Cr r Cr 

Number of prepared fact 

questions a as a - 43 .24 
Number of unprepared fact 

questions 
Per cent of unprepared fact 

questions 
Number of local fact ques- 


Number of prepared thought 
questions 

Per cent of prepared thought 
questions _--- 

Number of unprepared thought 
questions 

Per cent of unprepared 
thought 

Number of local thought 
questions 

Per cent of local weapon 
questions_- es 


TABLE XVII 


List OF ITEMS DEALING WITH TEACHERS COMMENTS WHICH HAVE CORRELATIONS SIGNIFICANTLY 
DIFFERENT FROM ZERO AFTER APPLICATION OF CORRECTION FORMULA 


Immediate Delayed 
U UWH Recall Recall 
Cr r Tr T 
Pupil praised _ = . 20 . 32 . 22 
Pupil encouraged. to recite ___ .24 .12 .24 
Teacher supplements pupils 


.18 .24 

Teacher repeats answer in 

almost exact words , ‘ . 32 .22 
Teacher repeats answer in 

different words ; é . 28 .24 
Summarized pupil answer__..__.. ; .14 . 24 
Clarifies, interprets, or limits 

pupil answer 
Generalizes pupil answer ___- 
Asks class to evaluate 


Raises question as to correct- 
ness of answer 


Asks for more information - 
Indicates answer partly right _ 
Indicates answer wrong 
Indicates answer right 
Teacher gives answer 
| eed asks “‘why?” 

t importance of con- 

“tribution 





JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 14, No.2 


TABLE XVIII 


List oF ITEMS DEALING WITH TEACHERS PRESENTATIONS WHICH HAVE CORRELATIONS 
SIGNIFICANTLY DIFFERENT FROM ZERO AFTER APPLICATION OF CORRECTION FORMULA 


Immediate Delayed 
U UWH Recall Recall 

r Cr Tr Cr 

. 05 . 24 ee rane 

. 53 .18 —.09 .29 
Gives personal opinion ie —.16 d i sae, 5 Sex studi 
Gives explanation - t ; . 34 ; . 28 —_ ‘. 
Gives illustration P io bao wise ae signif 
Gives summary * ste abd ae ; , | 
States generalization.........__.. a a Phy a ad - inves 
Number of times teacher pre- 

sented material 


TABLE XIX 


PROCEDURES WITH SIGNIFICANT POSITIVE CORRELATIONS AFTER APPLICATION OF CORRECTION 
FORMULA, WITH ONE OF PuPIL GAIN CRITERIA AND THEIR 
CORRELATION WITH OTHER CRITERIA SCORES 


(Asterisk Indicates Significant Coefficient) 


First Second 
Investigation Investigation 
Immediate Delayed 
Procedures U UWH Recall Recall 
. Per cent of pupil talk __- - : . 23 . 00 . 63* 
. Number teacher participations___- cose .23 t . 37 
Number recall of specified facts .14 } mS gg 
Number of short answers not discussed : . 00 ‘ —_ 
. Per cent of short answers not discussed ; . 00 : . 65* 
Per cent of unprepared fact questions ; . 54* ; . 00 
Number prepared thought questions .41 ; . 60* 
. Per cent local thought questions 3 . 00 , . 20 
. Raises questions as to correctness of answer_______- .61* . -. 20 
. Corrects misstatement . 00 ‘ . 73° 
. Suggests importance of contribution ______ . 54* r . 00 


1 
2 
3. 
4. 
5 
6. 
: - 
8 
9 
0 
1 


ee 


PROCEDURES WITH SIGNIFICANT NEGATIVE CORRELATION WITH ONE oF PupiL GAIN CRITERIA AND 
THEIR CORRELATION WITH OTHER CRITERIA SCORES 


(Asterisk Indicates Significant Coefficient) 


First Second 
Investigation Investigation 
Immediate Delayed 
Procedures UWH Recall Recall 
. Large per cent of teacher talk ‘ . 20 .00 
. Average length of teacher participation ; ‘ . 00 .00 
. Per cent no one called, no one answered questions. _-_ ‘ . 38 . 00 
. Per cent of short answer questions , . 53 . 00 
. Number unprepared thought questions ‘ . 32 —.98* 
. Per cent unprepared thought questions : . 54 —. 56 
. Indicates answer wrong ‘ . 00 —.69* 





December, 1945] PROCEDURES AND EDUCATIONAL Ut TCOMES 


ures associated with teaching success 
according to at least one of our criteria, and 
Table XX shows procedures associated with 
poor learning according to at least one cri- 
teria. 
An examination of these tables shows the 
following facts: 


1. Not a single procedure was found to be 
significantly related to pupil gain in both 
studies. 

2. Only two were found to be statistically 
significant for both criterion scores in a single 
investigation, (“Raises question as to correct- 
ness of answer”, Table XIX and “Indicates 
answer wrong”, Table XX). 

3. In nine cases the relationship with the 
other criterion score of the same investigation 
was zero (after application of correction for 
shortness of the series) and in seven cases the 
indicated relationship was the same but too 
small to be significant. 

4. In fourteen cases correlations in the 
other study had the opposite sign of the one 
listed as significant; in sixteen cases the cor- 
relations in the other study were zero, and in 
six cases they had the same sign but were not 
large enough to be significant. It will thus be 
observed that there was a tendency on the 
whole for procedures associated with pupil 
gain in one study to either show no relation- 
ship to pupil gain in the other study (16 
cases), or to actually show the reverse rela- 
tionship (14 cases). This tendency is signifi- 
cant and will be discussed in more detail 
further on in our analysis. 

In view of the above data it seems clear 
that the contribution of items such as here 
studied either is not large enough to make 
itself felt or the errors of measurement are so 
great that statistically significant correlations 
were not secured. The evidence would seem 
to suggest that certain procedures are posi- 
tively associated with pupil gain of a certain 
kind and under certain conditions but if a 
different kind of pupil gain is considered, or 
if the teaching conditions are changed, the 
relationship may disappear or even become 
negative in character. 


THE RELATION BETWEEN VARIOUS INDEXES 
OF PARTICIPATION AND THE 
CRITERIA SCORES 


Those evaluating teaching often use the 
percent of teacher and pupil talk, length of 
participations, number of teacher questions, 


119 


etc., as criteria of teaching efficiency. Data on 
such activities were included in our check lists 
and the relationship to the criterion scores 
studied. 

Consider first the relative amount of 
teacher and pupil participation as measured 
in terms of per cent of the total words spoken. 
Table XXI shows a great deal of variation 
between classes in this respect. For instance 
the teacher who ranked third highest by the 
UWA8 criterion spoke but 2.2% of the words 
in the two classes recorded, whereas the 
teacher who ranked second spoke 87.7% of 
the words in the two lessons. The average 
amount of teacher talk in all lessons of the 
first investigation was 38.7%. For a number 
of years teachers have been taught that more 
pupil activity and less teacher talk is needed, 
and some supervisors have used the ratio 
of teacher and pupil participation as an im- 
portant item in the evaluation of teaching. 
Because of the stress which has been placed 
on this during recent years, it is interesting 
to compare the ratio of teacher to pupil talk 
as found in three studies: Stevens in 1912, 
Barr in 1929, and the present study. Table 
XXII shows that Stevens found teachers 
doing 64% of the talking, Barr found his 
“good” teachers doing 52% and his “poor” 
teachers 56.7%. The present study shows 
teachers doing 38.7% of the talking. Thus 
there has been a gradual change in the direc- 
tion approved by educational theory, and 
teachers are today apparently talking much 
less than 30 years ago; probably, too, the re- 
sult depends upon the appropriateness of the 
teacher talk and what was said. 

The results from the first investigation are 
in conflict with those from the second in re- 
gard to the correlation between the amount 
of teacher talk and pupil gain. For both of 
the criteria scores based on pupil gain in the 
first study a positive, but not statistically sig- 
nificant correlation is shown with the amount 
of teacher talk. The average correlation for 
the two “gain” criteria is .24 (Table XX). In 
the second investigation the negative correla- 
tion with immediate gain was reduced to zero 
by the correction formula but there was a 
high negative correlation (—< 72) with de- 
layed recall. It seems there is nothing i in this 
study which would indicate the ratio between 
the amount of teacher and pupil talk as being 
a critical index of good or poor teaching. The 
data show very low negative correlations, not 





JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 14, No.2 


TABLE XXI 
INDEX ScoRES FOR SELECTED ITEMS FROM AcCTiviTy ANALYSIS 


Teachers Per cent Ave Ave Number Number Number Words per 


Per cent rage rage 
Length of Length of of pupil of teacher of teacher minute 


of of 


teacher 
talk 


pupil 


teacher 
partici- 
pations 


pupil 
partici- 
pations 


partici- 
pations 


partici- 
pations 


questions 


Af AP EE AEOOPRN POM 
WHA OCAAAEAOKHAWAWFHlOW-) 


AARP ARAEOAINON Cr @W-10 


CAD AWDEHAAADOWA®W-IW CIC 


G0 G0 GO ND ye r+ RO r+ CORO RD RO ND Go.G0 
ADO OACOCAAAI-CHWOAIrkowno 


x 
DO CONN PRK ORARWOMS 


AP CODPROROCERHOCAWHEROD 


a 
wat wk 
~] 


TABLE XXII 
COMPARISON OF SELECTED ITEMS FROM ACTIVITY ANALYSIS WITH STUDIES BY STEVENS AND Barr 


. Number of questions per 30 min. period 

. Per cent of teacher talk 

. Ave. length of teacher participations in sec 

. Average length of pupil participations in sec 
. Fact questions per 40 min 

. Thought questions per 40 min 


Stevens 

Present 

Study 
65 
38.7 
13. 7* 
16. 3* 
29.3 
16.1 


teachers teachers 


92.7 101.7 
52.0 56.7 
13.0 14.2 
14.3 10.9 
45. 55.3 
16.6 11.2 


* Estimated on basis of number of words per average teacher and pupil recitation and the average 


number of words spoken per minute for each lesson. 


** Based on only four lessons where exact time was given. For twenty periods the range in length 
was from 30 to 45 minutes. Miss Stevens reports an average of 61.2 questions per recitation period. 


large enough to remain greater than zero 
when corrected for the shortness of the series, 
between the gain criteria and the average 
length of pupil participations. There was thus 
no relationship discovered between the length 
of pupils’ participations and pupils’ gain in 
either of the studies. 

The average length of teacher participa- 
tions in the first study showed no significant 
relationship to pupil gain. In the second 
study, however, a statistically significant 
negative correlation was found between de- 
layed recall and length of teacher participa- 
tion. In this case, teachers who presented 
large amounts of material at one time, with- 
out interruption, produced smaller delayed 
recall gains than those who made many short 


participations. The correlation between imme- 
diate recall and the length of the average 
teacher participations was too small to remain 
greater than zero after correction. 

The number of teacher participations 
showed a positive correlation with all four of 
the criteria (Table XIX). Although the cor- 
relation with delayed recall is the only one 
statistically significant it seems probable that 
in this study those teachers who made many 
short participations were more successful on 
the whole than those who participated less 
frequently. Here again, however, the differ- 
ences in frequency of participation is so great, 
when teachers of about equal ability by the 
criteria scores are compared, as to make it 
evident that this index alone could not be 





i, eee | ee | 


December,1945| PROCEDURES AND EDUCATIONAL OUTCOMES 121 


trusted to differentiate between good and poor 
teaching (Table XIX). Thus the teacher 
ranked as second best in the first investigation 
participated on the average 3.4 times each 
minute, which was next to the highest fre, 
quency among the 16 teachers. The teacher 
who rated third participated only once in ten 
minutes or 1/34 as frequently, and yet the 
difference in measured gain was small 

The correlations show no significant rela- 
tionship between the number of questions and 
the pupil gain (Table XIX). Since teachers 
have long been criticized for the large num- 
ber of questions they ask, it is interesting to 
compare the results of Stevens’ and of Barr’s 
studies of this point with the present investi- 
gation. Table XXII shows that Stevens in 
1912 found teachers asking over 100 ques- 
tions per 40 minute class period. Barr in 1929 
found his “good” teachers asking 92.7 ques- 
tions and his “poor” teachers 101.7 questions 
for a like period, while the present study 
showed an average of only 65. 

In this study there was no significant rela- 
tionship between the length of the question 
and the pupil gain. 

Table XI shows that in this study there 
was a small negative correlation between 
pupil gain and number of English errors. This 
coefficient was not statistically significant; in 
fact it was less than the correction demanded 
by the correction formula. Table XI also 
shows that the “move of the recitation” as 
measured by the average number of words per 
minute was not apparently correlated with 
pupil gain as application of the correction 
formula reduced all of these coefficients to 
zero or changed their sign. 

Forty-six different items concerned with 
the teachers’ use of questions were correlated 
with each of the four “gain” criteria scores 
and with the two “expert opinion” criteria. 
Not a single item of the forty-six showed a 
significant correlation with all of the gain 
criteria. A few items, however, show signifi- 
cant correlations with the two gain criteria 
of one investigation or the other. Table XI 
shows all the correlations found on question- 
ing procedures. Since most of them are not 
significant we shall discuss only selected items 
from the list. 

The number of questions demanding that 
the pupil merely recall a specified fact showed 
a low but positive correlation with gain cri- 
teria in the first investigation, and a much 


higher positive correlation in the second 
study. In fact, the correlation with immediate 
recall is statistically significant with a coeffi- 
cient of correlation of .71 and a standard 
deviation of .15 after correction (Table XI). 
The fact that the correlation is so much 
higher on immediate recall in the second 
study may be explained by the fact that the 
acquiring of textbook information was the 
chief objective in this study, and that more 
recall questions may have been more appro- 
priate than in the first study where broader 
objectives were set up. 

The next three items in Table XI, “num- 
ber of expository questions,” “number of 
selective recall,” and “per cent of selective 
recall,” had a correlation greater than their 
correction with only one of the criteria scores. 
None of these correlations were statistically 
significant. The number of judgment ques- 
tions showed a positive correlation with both 
of the gain criteria of the first investigation 
(Table XI). While neither is statistically 
significant they do suggest that in the first 
investigation teachers who were most success- 
ful asked, on the whole, more questions de- 
manding the pupils to make a judgment of 
some sort. The per cent of judgment questions 
showed a small positive correlation with im- 
mediate recall but a negative correlation of 
about the same size with delayed recall. The 
“number of recall and organization” ques- 
tions, the “per cent of recall and organiza- 
tion” questions, and the number of “class 
routine” questions, each showed a correlation 
greater than its correction with but one of 
the four “gain” criteria scores and none of 
these was significant. 

The per cent of “class routine” questions 
had a negative correlation of —.40 with the 
UWH criteria score, but smaller positive cor- 
relations with the “gain” criteria of the sec- 
ond investigation. The number of rhetorical 
questions (those asked by the teacher without 
expecting an answer, such as, “Now that 
would be dangerous, wouldn’t it?”) had small 
positive correlations in the first study and a 
small negative correlation in the second. 
When expressed in terms of per cent these 
correlations become fairly significant—each 
being more than twice its sigma. In fact the 
relationship between the number of rhetorical 
questions and the “gain” criteria in the first 
investigation was greater than that found for 
any other type of question in the “Mental 





122 JOURNAL OF EXPERIMENTAL EDUCATION 


Process” classification. The only explanation 
the author can suggest is that it is his impres- 
sion that teachers with considerable enthu- 
siasm and “pep” are more inclined to use this 
sort of question than are others. This would 
not, of course, explain the negative correlation 
obtained in the second study. 

The analysis of the correlations shown in 
Table XI may be summarized by stating that, 

/on the whole, there seems to be very little 

relationship between the teachers’ use of the 
types of questions classified and the gains 
made by the pupils. 

Table XIV gives a list of procedures in- 
volving the use of questions that call for a 
“yes” or a “no” answer or that can be appro- 
priately answered with a few words and their 
correlation with gain criteria. As a whole, 
teachers have been taught to avoid the use of 
such questions unless they were to be followed 
up by a discussion in which the pupil had to 
defend his answer. 

While the correlations found between the 
frequency of use, or per cent of use, of this 
type of question and pupil gain were, with 
only a few exceptions, low, it is interesting to 
note that of the sixteen greater than zero 
after correction, fourteen were positive. That 
is, such evidence as we have would suggest 
that these generally frowned upon “yes—no” 
and short answer questions seem positively 
related to pupil gain. This is true of those 
questions not followed by discussion as well 
as those that were followed by discussion. 

So far as the results of this investigation 
would indicate there is little relationship be- 
tween the way in which pupils are called on 
and pupil gain as is shown in Table XV. Of 
the 36 correlations found in this series only 
14 were greater than zero after correction and 
only one of the fourteen was statistically sig- 
nificant. 

No relationship between the number of 
fact questions covering material over which 
the pupils had definitely prepared and pupil 
gain was found in the first study. In the sec- 
ond study the correlation of this index with 
immediate gain was .43 which is fairly sig- 
nificant. This may be due to the fact, as has 
been pointed out before, that factual ques- 
tions on textbook material may have been a 
more appropriate procedure in the second 
study due to the type of objectives set up. 

The number of fact questions dealing with 
material not definitely assigned shows a cor- 


[Vol. 14, No.2 


relation of .30 and .38 with the U and UWH 
criteria respectively (Table XVI) which, 
while not statistically significant, is much 
higher than the correlations found for fact 
questions over prepared material. This would 
suggest that in the first investigation, where 
the measured results of teaching covered more 
than mere textbook learning, the questions 
growing out of pupil experiences and interests 
were more productive of gain than questions 
which merely quizzed over material read. On 
the other hand, the evidence from the second 
study, where the objective was mastery of 
textbook material, is conflicting. When cor- 
related with immediate gain a smal] negative 
correlation (—.12) was obtained which was 
smaller than its correction, but with the de- 
layed gain the correlation was .42 after cor- 
rection. When the per cent of such questions 
is correlated with pupil gain the correlations 
are in each case slightly higher than when 
the mere number of questions is considered. 
The above results for the second study would 
suggest that for immediate recall, questions 
over prepared textbook material were more 
effective than questions dealing with non- 
textbook material, but that for delayed recall 
the reverse was true. 

The number of questions which were in the 
form of thought questions but which were 
based on memory of materials read, show a 
negative correlation of —.41 with the U score. 
On the other hand in the second study the 
correlations are .60 and .56 with immediate 
recall and delayed recall criteria respectively. 
Here again the difference, negative in one 
study and significantly positive in the other, 
might be logically ascribed to the different 
objectives in the two studies which might 
affect the appropriateness of this type of 
question. 

The number of thought questions on the 
other hand which grew out of the pupils’ own 
interests and experiences in the course of the 
discussion showed a relatively high correla- 
tion with gain in the first study (.40 and .32 
with U and UWH respectively, Table XVI) 
and very high negative correlations in the 
second (—.98 and —.54 with immediate and 
delayed recall respectively). The consistency 
with which the results are reversed in the two 
investigations when questions on prepared or 
unprepared material is considered would seem 
to substantiate the theory already advanced 
that one type of question may have been 


Decemb 


approp! 
type ap 
The 
to the 
found | 
four g 
statisti 
In c 
comme 
of teac 
criteric 
nifican 
Three 
pared 
and O1 
of the 
first. 





December, 1945| PROCEDURES AND EDUCATIONAL OUTCOMES 


appropriate in the first study and another 
type appropriate in the second. 

The number of thought questions related 
to the local environment of the children was 
found to have a positive correlation with all 
four gain criteria although in no case is it 
statistically significant. 

In considering the relationship of teacher’s 
comments to pupil gain, while some 24 types 
of teacher comment were correlated with the 
criterion scores, none possessed statistical sig- 
nificance, in relation to all of the four criteria. 
Three were statistically significant when com- 
pared to the criteria scores of the first study, 
and one when compared to the criteria scores 
of the second study. We shall consider these 
first. 


123 


was found for four types of comments gener- 
ally accepted as good and the average for six 
types of comments generally considered as 
poor. These data are shown in Table XXIII. 

While the median value of the correlations 
of “good” procedures is slightly higher than 
that for the “poor”, the difference is so 
small and the overlapping is so great that it 
would seem to make little difference in pupil 
gain whether comments from the “ ” or 
“poor” grouping were used. The relationship 
which seems to be most significant in this 
category is that between comments raising a 
question as to the correctness of a response 
and the UWH and U criteria scores. 

The possibility of forming some kind of a 
composite score by combining a number of 


TABLE XXIII 
CORRELATION BETWEEN UWH Criteria Scores AND “GooD” AND “Poor” COMMENTS 


**Good’’ Comments 


ESE a eee h 
Class asked to evaluate answer._..._.________ . 
Raised question as to correctness. 
Teacher asks “‘why?”_.._..__ __- 


Comments which raised a question as to 
the correctness of a pupil’s contribution were 
found to have a correlation of .64 and .61 
with the UWH and U criteria respectively 
(Table XVII). This would suggest that teach- 
ers who were rather critical of pupil responses 
produced greater gains. It is interesting to 
note that this same item in the second study 
showed a negative correlation of —.20 with 
immediate recall. It seems probable that in 
the second study, since discussion was based 
almost entirely on text material just read, 
that a large number of comments questioning 
the correctness of a pupil response probably 
indicated lack of mastery of the material read, 
and the criterion score was based entirely on 
mastery of text material. Thus again the dif- 
ference in objectives may explain the differ- 
ence in the results obtained in the two studies. 

The number of times teachers suggested 
the importance of a pupil contribution had a 
positive correlation that was significant, but 
only five teachers made comments of this sort. 

In summary there seems to be but little 
relation between the type of comments teach- 
ers make and pupil gain. To get further evi- 
dence on this point the average correlation 


“Poor” Comments 


i ts tid cca tn ew i sb ‘ 
Teacher repeated answer in same words _- -_--_- : 
Teacher repeated ans. in different words_-_--- - : 
“all right”’ used —.06 
x | x ~—ete .41 
tr san 5h Ginvesating here aiegion weeks .14 


the items in the activity list was suggested 
and investigated on the theory that there 
might be a higher relationship between a 
larger number of related activities and teach- 
ing success as indicated by measured pupil 
gain than that found for the individual items 
here studied. 

An inspection of the coefficients of correla- 
tion shown in Table XI seemed to indicate 
that in the first study the items which had 
the most significant positive correlation with 
pupil gain were those having to do with the 
extent to which questions were based on pupil 
interest and experience rather than on 
assigned material, the extent to which teachers 
contributed explanations and factual informa- 
tion, the extent to which the teacher chal- 
lenged pupils to support their ideas, and the 
amount of spontaneous pupil discussion. It 
seems that all of these items have to do in 
one way or another with the making of the 
discussion period a period of meaningful ex- 
change of ideas rather than a teacher quiz of 
pupils over assigned materials. The composite 
score formed from these items will be referred 
to as the “Index of Meaningful Discussion”’. 
Table XXIV shows the seven items included 





JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


TABLE XXIV 
CORRELATIONS BETWEEN SEVEN ITEMS USED IN INDEX OF MEANINGFUL DISCUSSION AND 


UWH anp U 


- Per cent of fact questions on unprepared material 

. Per cent of thought questions on unprepared material 

- Per cent of thought questions dealing with local situations 

. Number of participations growing out of spontaneous pupil discussion 
. Number of teacher explanations 
. Number of times teacher presented factual information , 

. Times teacher raised a question as to correctness of a pupil response . 67 


in the index of meaningful discussion and the 
correlation of each with the UWH and the U 
criteria scores. 

The first step in building up the composite 
score was to change all of the frequency 
scores from Table A,*™* used in the composite 
as shown in Table A,'"* into standard scores, 
Table A,"**. This was done by dividing each 
score by the sigma of the series in which it 
was found. The standard scores so obtained 
for each of the seven items were added to give 
an unweighted composite for each of the 
teachers as shown in Table XXV. The un- 
weighted composite scores so found showed a 
correlation of .63 with the UWH criterion 
score and a correlation of .63 with the U cri- 
terion score as is indicated in Table XXVII. 

Since some of the items used in the com- 
posite score had a considerably higher cor- 
relation with the UWH score than others, it 
was decided to form a composite score where 
each item would be weighted in proportion to 
the size of its coefficient of correlation with 
the UWH criteria scores. To accomplish this, 
each of the standard scores was multiplied by 
the appropriate coefficient of correlation. The 
weighted scores so obtained are given in 
Table XXV. This composite weighted index 
of meaningful discussion formed by adding 
the weighted standard scores for each teacher 
was found to have a correlation of .81 with 
the UWH scores and a correlation of .39 with 
the U scores (See Table XXVII). 

Under the conditions of the first investiga- 
tion it appears that there is a relatively high 
positive relationship between the index of 
meaningful discussion and pupil gain. Those 
teachers who had the most discussion on 
material growing out of pupil interest and 
experience, rather than on material assigned 
to be learned; who were active in bringing in 
additional information and in explaining 
aa” original report on file, Library, University of Wis- 


points not clear; who challenged pupils to 
prove their points, and in whose classes theré 
was considerable spontaneous pupil discus- 
sion, were found to have been most successful 
as measured in terms of pupil gain. 

The index of meaningful discussion was 
next computed as described above for each of 
the ten lessons in the second study as indi- 
cated in Tables A,, A,, in original thesis and 
Table XXVI. The composite scores so 
found based on standard scores had a correla- 
tion of —.66 with the immediate gain score 
and a correlation of —.55 with delayed gain 
score. The composite scores formed from 
weighted standard scores had a correlation of 
—.67 and —.68 with immediate and delayed 
recall respectively as may be seen in Table 
XXVII. 

It is obvious that these results are in con- 
tradiction to those obtained in the first study. 
The first study showed a high positive cor- 
relation. In the first study those activities in- 
cluded in the index of meaningful discussion 
were performed most frequently by the teach- 
ers who produced the greatest gains; in the 
second most frequently by teachers who pro- 
duced the least gains. This would seem to in- 
dicate that the activities included in the com- 
posite score were appropriate and effective in 
the first study; that they were inappropriate 
and ineffective in the second. A comparison 
of the teaching situation in the two studies 
will show that this difference in the appro- 
priateness of teaching activities might be 
expected. 

It will be recalled that the objectives set up 
in the first study were broad and called for 
much more than the mere recall of facts. The 
testing program was set up to measure as 
accurately as possible these broader objec- 
tives.2? In this situation it would seem logical 
that those activities included in the index of 


% The item securing the test weight in the criterion 
was the Wrightstone Scale of Civic Beliefs. 


A 


OAH om corr 


one 


— 
cow namnow 





December, 1945| PROCEDURES AND EDUCATIONAL OUTCOMES 125 


TABLE XXV 


Composite INDEX OF MEANINGFUL Discussion (IMD) AND INDEX oF IMMEDIATE REcALL (IIR) 
(First Investigation) 


Composite Composite Composite Composite 
IMDfrom IMD from IIR from IIR from 
Standard Weighted Standard Weighted 
Scores Standard Scores Standard 
Scores Scores 

16. 26 9.25 . 84 2.56 
12.17 6. 56 4 
11.20 5.79 


5 


woonnu-. wrer- 
o 
a20 
> 


Coonwre 
PH CORORONIG? POON CoCo mH 


AP Om E ROS RNs 

323sS8Fe 

PO POPS GONDII INV 

esr rNRersass 

ASANO H HK AMAA ooo 

PNR RO BWAISH Mr: 
te > | 


TABLE XXVI 


CoMPOSITE INDEX OF MEANINGFUL Discussion (IMD) AND INDEX OF IMMEDIATE RECALL (IIR) 
(Second Investigation) 


Composite Composite Composite Composite 
IMDfrom IMD from IIR from IIR from 
Standard Weighted Standard Weighted 
Scores Standard Scores Standard 
Scores Scores 
3.21 : 2.94 
5. 20 ’ 7.55 
7.86 nq 3.06 


1 


tt 
ao 
<) 


Swomramsm wr 
— 
SrerLocarss 
oo 
~ 
wo 


a 
ASMOORMwWOr 
CODEN EN SOs 
SSESEES 


TABLE XXVII 


COEFFICIENTS OF CORRELATION BETWEEN VARIOUS COMPOSITE INDEX SCORES 
AND CRITERION SCORES 


Composite Composite Composite Copgemie 
IMD* from IMD from IIR* from IIR from 
Standard Weighted Standard Weighted 
Scores Standard Scores Standard 
Scores Scores 

iy . 81 .19 
; .39 —.35 
Expert Opinion (first study) , —.75 —.19 
Immediate Recall ; —. 67 . 82 
Delayed Recall . —. 68 . 58 
Expert Opinion (second study) ‘ _42 —.31 


*Key—IMD indicates ‘‘ Index of Meaningful Discussion.” 
IIR indicates ‘‘ Index of Immediate Recall.” 





JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


TABLE XXVIII 


TABLE OF WEIGHTED SCORES FOR SEVEN ITEMS FROM ACTIVITY ANALYSIS USED IN FORMING 
“INDEX OF MEANINGFUL DISCUSSION COMPOSITE SCORE” 
(First Investigation) 


Teacher No. Item 2 


2.3 
1.27 
2.38 
1.10 
. 35 
1 
1 


Item 3 


Item 4 Item 5 Item 6 
.15 . 26 .97 
ra 1.13 2. 
16 ml "| 42 

Pe 


1.34 
.16 


1.12 


. 24 


TABLE XXIX 


TABLE OF WEIGHTED ScorEs FOR SEVEN ITEMS FROM ACTIVITY ANALYSIS USED IN FORMING 
“INDEX OF MEANINGFUL DISCUSSION COMPOSITE SCORE” 
(Second Investigation) 


Teacher No. Item 1 Item 2 


- 45 1. 
. 80 1 
. 00 2. 
. 35 2. 
1 
1 


. 00 
. 75 
-15 F 
.00 1. 
. 00 ‘ 
. 80 3. 


meaningful discussion would be highly appro- 
priate. On the other hand the second study 
was set up so that the learning of textbook 
material was the major objective and the 
testing program was based entirely on ability 
to recall factual material included in the text. 
In this situation it would seem inappropriate 
to do anything but to discuss the textbook 
material; to bring in additional information, 
or to allow the discussion to follow pupil 
leads, or to raise questions about materia] not 
studied, would not only fail to aid the pupils 
in making a gain on the test, but would dis- 
sipate their attention and time so they would 
actually have less chance to make a gain. 

In view of the above data it seemed reason- 
able to suppose that a composite score made 
up of items relating to mere recall of assigned 
material should have a positive correlation 
with pupil gain in the second study, where 
the emphasis was on recall] of factual infor- 


Item 3 
1. 48 a Sas 
ae abie 1. 43 


1.48 iad “.29 


Item 4 Item 5 Item 6 


. 29 2. 


Item 7 
1 
2. 

.29 4a 
1 

“29 


mation, and a much lower or negative cor- 
relation with the UWH and U scores derived 
from the first investigation. To test this 
hypothesis a composite score was built up by 
the same procedure as was used to secure the 
index of meaningful discussion using the fol- 
lowing items from the activity analysis: 
(1) Questions demanding recall of specified 
fact; (2) number of factual questions on pre- 
pared material; (3) number of thought ques- 
tions on prepared material; (4) number of 
times teacher indicated answer right. 

When the composite index of immediate 
recall based on standard scores was correlated 
with immediate gain, a coefficient of .82 was 
secured and a coefficient of a .53 was found 
with delayed gain. The composite scores based 
on weighted standard scores had practically 
the same coefficient of correlation, as is shown 
in Table XXVII. The correlation of the 
weighted series of index scores from the sec- 


Decet 


ond : 
score 
emph 
teach 
in oF 
were 

situa’ 





December,1945| PROCEDURES AND EDUCATIONAL OUTCOMES 


ond study (Table XXVII) with the UWH 
score was .13 and with the U score .o2, which 
emphasizes again the fact that in this study 
teaching procedures appropriate and effective 
in one situation, with one set of objectives, 
were ineffective and inappropriate in another 
situation and with other objectives. 

One criterion of teaching ability employed 
in this study was the rank given by experi- 
enced supervisors to mimeographed transcrip- 
tions of samples of teaching obtained from 
the sound records. In the first investigation 
five experienced supervisors ranked lessons 
taught by 22 teachers in order of merit. The 
results of their ranking is shown in Table 
An 

In order to determine the amount of agree- 
ment between supervisors in the evaluation of 
the verbal happenings of a recitation when 
those happenings were recorded in such a 
form that they could be carefully studied, 
analyzed, and compared, correlations were 
run between the ranking given by each super- 
visor and those given by each of the other 
supervisors. The results are given in Table 
XXX. 


TABLE XXX 


CORRELATIONS BETWEEN THE RANKINGS GIVEN 
To 22 TEACHERS BY EACH SUPERVISOR AND 
THE RANKING OF THE SAME TEACHERS 
BY EaAcH or Four OTHER 
SUPERVISORS 


2 3 4 





. 43* .27 


*Rankin 


of second set of lessons only, asSuper- 
visor ITI di 


not rank set one. 


As will be noted from Table XXX the cor- 
relation between various sets of rankings 
made by the five supervisors range from .73 
for supervisors 2 and 5, to —.11 between the 
rankings of supervisors 2 and 3. The average 
coefficient of correlation was .46. This seems 
to indicate that while, in general, there was 
some agreement between supervisors there 
were also many marked variations. Supervisor 
2, for example, tended to rank teachers low 
that Supervisor 3 ranked high, as is indicated 
by a negative correlation between the two 


%* See original report on file, Library, University of Wis- 


127 


rankings, and there was practically no cor- 
relation at all between the rankings given by 
Supervisor 3 and Supervisor 5. When the 
rankings given to individual teachers are con- 
sidered, even greater variations are noted. Thus 
Teacher No. 4 was rated as best of the 22 by 
Supervisor I, but poorest of the 22 by Super- 
visor IV. Teacher 18 was rated as second best 
by Supervisor II but 13th by Supervisor IV, 
and Teacher 22 was ranked 6th by Super- 
visor I and 16th by Supervisor IV. Table 
XXXI indicates the number of times each 
teacher was ranked in the upper, middle, or 
lower third of the total teacher group by the 
five supervisors. In 7 cases the same teacher 
was ranked in the lowest and also in the 
highest third, and in only two cases did the 
Supervisors all agree on a ranking within a 
given third (Teachers 14 and 8). Thus super- 
visors disagree markedly in their evaluation of 
samples of teaching even when these samples 
are presented in such form as to permit care- 
ful study and analysis. 


TABLE XXXI 


NUMBER OF TIMES EACH OF 22 TEACHERS WAS 
RANKED IN UPPER, MIDDLE, oR LOWER THIRD 
oF GrouP, ACCORDING TO RANKINGS OF 
FrivE SUPERVISORS ON THE BASIS OF 
MIMEOGRAPHED RECORDS OF 
RECITATIONS 


Upper Middle Lower 


NCOCRNKH CHRON NOCH NHK NHK NON 
HK ORK WOWH OR WORWORN WHR 
Ree OWNOCOFOCWHNOOFFNNWO 


Two samples of teaching were secured 
from 17 of the teachers taking part in the 
study, one sample from the teaching in Unit I 
and one from the teaching in Unit II. Four of 
the supervisors ranked all the lessons from 





128 JOURNAL OF EXPERIMENTAL EDUCATION 


Unit I and then at a later date ranked all 
lessons in Unit II. It is thus possible to get 
two rankings for each teacher from each of 
the four supervisors, as is shown in Table 
A,.2”” Correlations were then run to deter- 
mine the relationship between the ranking 
given by each supervisor to the 17 teachers 
on their first and second samples of teaching. 
Table XXXII shows the result. 


TABLE XXXII 


COEFFICIENTS OF CORRELATION BETWEEN RANK- 
INGS GIVEN BY EACH oF Four SUPERVISORS 
TO THE FIRST AND SECOND SAMPLES 
OF TEACHING FROM SEVENTEEN 


These correlations indicate that there was 
practically no relationship between the rank- 
ings given teachers on the first and second 
samples of teaching by the same supervisors. 
In other words the fact that a teacher was 
ranked high by a supervisor as a result of his 
studying a sample of her work in Unit I was 
practically no indication that she would be 


ranked high in the second sample. 


The fact that the correlations are so low 
may be due to the inaccuracy of supervisors 
in evaluating teaching, or to variability in 
teacher effectiveness from lesson to lesson, or 
to a combination of both factors. 


Perhaps more important than the amount 
of agreement between supervisors in ranking 
teachers, or between the two rankings of the 
same supervisor, is the correlation between 
the criterion scores based on rankings by 
supervisors and the criteria scores based on 
pupil gain. 

The correlation between the ranking of the 
16 teachers in the first investigation on the 
UWH criterion score and the composite of 
supervisory rankings was p == —.37 and be- 
tween the U criterion score and a composite 
of supervisory ranking, p == —.11. The cor- 
relation between the ranking of the 10 
teachers in the second investigation on the 
immediate recall criterion scores and the 
supervisory rankings was —.15 and between 

19> See original report on file, Library, University of Wis- 


[Vol. 14, No.2 


the delayed recall criterion score and super- 
visor rankings —.18. 


In other words it would seem that the 
small relationship which existed between the 
ranking of the teachers by supervisors and 
the ranking of the teachers on the basis of 
pupil gain was negative and that there was a 
slight tendency for supervisors, basing their 
judgment entirely on mimeographed reports 
of teaching, to select poor teachers as good, 
and vice versa, as measured by pupil gain. 

Correlations were also found between the 
index scores from the activity analysis and 
the supervisory rankings as shown in Table 
XI. Analysis of the data given in this table 
seems to indicate that there is a tendency for 
the relationships between the activity index 
scores and pupil gain criteria to be reversed 
in the relationships between these same activ- 
ities and supervisory rankings. Thus, as will 
be seen from Table XXXIII, of the & 
correlations run between activity index scores 
and supervisory rankings only 29 agreed in 
sign with correlations based on the U crite- 
rion; in 55 cases the signs were reversed. The 
same general situation is found with each of 
the other criterion scores in both the first and 
second investigations. In the 70 cases where 
the sign of the correlation was the same with 
both the U and UW8H criteria scores, the cor- 
relation with supervisory rankings was of the 
opposite sign in 67% of the cases. The same 
situation was found to exist in the second 
investigation as indicated in Table XXXIII. 
Thus in both studies there was a decided 
tendency for procedures which correlated in a 
positive way with pupil gain criteria to have 
a negative correlation with supervisory rank- 
ings, and vice versa. 


This study would suggest that when objec- 
tive and relatively complete records of the 
verbal happenings of recitations are evaluated 
by supervisors, that there is the same sort of 
disagreement, lack of consistency and inaccu- 
racy which previous studies have found in 
supervisory evaluations based on direct 
observation. 


SECTION SUMMARY 


1. It is possible to make an analysis of the 
type and frequency of “primary” teaching 
acts from typed transcriptions of sound rec- 
ords and these analyses have a high degree of 
objectivity. 


Criterio: 
U 

UWH.. 
U at 
Immedi 
Delayec 
Immedi 


2. V 
found — 
relatior 
success 
5.3 70 
were si 
second 

3. 
which 
criteric 
meanit 
of assi 

“a 





December,1945| PROCEDURES AND EDUCATIONAL OUTCOMES 


129 


TABLE XXXIII 
FREQUENCY OF AGREEMENT AND DISAGREEMENT BETWEEN SIGN OF COEFFICIENT OF CORRELATION 


BETWEEN INDEX 


M ACTIVITY ANALYSIS AND PUPIL GAIN CRITERIA AS 


COMPARED WITH SIGN OF COEFFICIENT OF CORRELATION BETWEEN SAME 
INDEX SCORES AND “EXPERT OPINION” CRITERION 


Criterion Score 


UWH ‘ 
U and UWH have same sign 


Immediate Recall 
Delayed Recall 


Times 
Disagree 


Times Per Cent of 


Disagreement 


Immediate Recall and Delayed Recall have same sign 


2. Very few “primary” teaching acts were 
found to have a statistically significant cor- 
relation with the criterion scores of teaching 
success employed in this investigation. Only 
5.39% of the correlations in the first study 
were statistically significant and 7.6% in the 
second study. 

3. In the first study those procedures 
which had the highest correlation with the 
criterion score were those connected with 
meaningful discussion rather than with recall 
of assigned material. 


4. In the second study those procedures 
which had the highest correlation with the 
criterion scores were those connected with the 
recall of assigned material. 


5. A composite score formed by combining 
scores derived from seven activity index 
scores having to do with aspects of meaning- 
ful pupil discussion was found to have high 
positive correlations as a whole, with pupil 
gain criteria in the first investigation (range 
from .81 to .63, Table XXVII), but high 
negative correlations wii pupil gain scores in 
the second investigation (range from —.55 to 
—.68, Table XXVII). 


6. A composite score formed by combining 
scores derived from four activity index scores 
having to do with the recall of material 
studied was found to have a high positive cor- 
relation with pupil gain in the second investi- 
gation (range from .86 to .51, Table XXVII), 
but a low or even negative correlation with 
pupil gain in the first investigation (range 
from .19 to .13, Table XXVII). 

7. It would seem that procedures appro- 
priate and effective under the conditions of 
the first study were inappropriate and ineffec- 
tive under conditions of the second study. 

8. Evaluations of teachers based upon a 
study of verbal records of the teacher’s work 


by experienced supervisors was found to be 
quite unreliable and correlated low or nega- 
tively with pupil gain. 


SECTION V 


A STUDY OF PUPIL GAIN ON PARTIC- 
ULAR TEST ITEMS AND THE 
PROCEDURES INVOLVED 
IN TEACHING 


The procedure employed in the second in- 
vestigation, made it possible to study the 
relation of pupil gain and the teaching pro- 
cedures employed. It will be recalled that a 
complete experimental pattern of the teaching 
was secured in the second investigation. The 
second investigation provided for (1) a pre- 
test, and (2) a period of teaching during 
which a sound record was made of all the 
verbal happenings of the class. This record 
was supplemented by evaluations based on 
the Barr—Harris Performance Record,** (3) a 
study period where pupils read their text and 
reference material (4) a final test (5) a de- 
layed recall] test at the end of five weeks. 


It is apparent that the data and records 
here secured gives a very complete picture of 
the learning situation from the time of the 
application of the first pretest to the admin- 
istration of the final test. When these data 
are studied with reference to the several items 
in tests used to measure pupil gain, it can be 
determined exactly what study helps, if any, 
which were given to cover this item, what the 
teacher or any pupil said about the material 
relating to this item, and exactly what the 
textbook had to say. The test items were 
organized under topical headings such as 
“climate” “industries” etc., so that any item 

mA. S. Barr and Albert E. Harris, Barr-Harris Teacher's 


Performance Record (Madison, Wis.: Journal of Experimen 
Education, 1943). 





130 JOURNAL OF EXPERIMENTAL EDUCATION 


could be quickly located. The reorganized 
test items were then typed and duplicated, 
leaving several spaces between each item. 
Checking one such item at a time everything 
found in the textbook pertaining to each was 
written in the blank provided for this pur- 
pose. Thus it was possible to record after each 
test item all that was said about that item 
during class discussion, who it was that said 
it, and whether it was during the introduction 
to the lesson or during the recitation follow- 
ing the study period. On another form the 
teaching procedures employed by each teacher 
in each class in teaching each item were re- 
corded. On this same form was also indicated 
the per cent of errors made in the pretest 
which were corrected in the immediate recall 
test. Thus we have a record of the procedures 
used in teaching the facts necessary to answer 
each test item, and a score indicating the 
effectiveness of those procedures in correcting 
test errors. The analyses of data from these 
forms suggests the relative effectiveness of 
each of the procedures, and of several com- 
binations of procedures in producing pupil 
gain. 

A study of the relative effectiveness of dif- 
ferent teaching procedures, as shown in Table 
XXXIV, shows that textbook reading alone 
was the least effective teaching device, as 
only 28% of the pretest errors were corrected 
on items only read about. The use of prepared 
study helps added to the effectiveness of the 
reading as 37% of the errors were corrected 
when they were used. More effective than 
written study helps was the presentation of 
factual information to the class before the 


TABLE XXXIV 


GAIN PRODUCED IN TEN CLASSES ON TEST 
ITEMS TAUGHT BY DIFFERENT 
TEACHING PRODEDURES 


Per cent of 
Errors Cor- 
Procedures Used rected 
Material only read 28 
Read with study helps 37 
Read with teacher presentation during 
introduction 48 
Read with teacher presentation during pa 


discussion 
Read with study helps and pupil pre- 
sentation in discussion 55 
Read with teacher and pupil presented 
material during discussion 33 
Read with pupil presented material 
during discussion 42 


[Vol. 14, No.2 


study period. When teachers called the pupils’ 
attention to a particular fact, and they later 
read in their text about the same fact, it was 
found that they corrected 48% of the errors 
made in the pretest on items taught in this 
way. Teacher presentation of material during 
the discussion following a study period was 
not nearly as effective as teacher presentation 
before the study period, as is shown by the 
fact that the former procedure produced a 
correction of only 32% of items missed in the 
pretest, as compared to 48% by the latter 
method. 


Pupil presentation of material during the 
recitation period following the study period 
was found to be more effective than teacher 
presentation, the respective per cents of cor. 
rections being 42 and 32 respectively. An ex- 
planation for this may be the fact that the 
recitations studied were of the question and 
answer type, and that a pupil presentation 
had been almost invariably preceded by a 
teacher question that had centered the atten- 
tion of the class on the point at issue. Thus 
we really have both.teacher and pupils active 
in this situation, and the topic is really 
touched on twice—once when the teacher 
raises the question, and again when a pupil 
answers it. In contrast the teacher presenta- 
tion calls the fact to the attention of the class 
a single time and perhaps does not center the 
attention of the class so effectively as the 
teacher question which someone will be called 
upon to answer. 


It will be noted that less improvement was 
made on items on which both teacher and 
pupils presented material than on items where 
the pupils alone presented materials. This 
may be interpreted as suggesting that the 
teacher’s questions calling forth a pupil dis- 
cussion are of more value than a teacher pre- 
sentation followed by a pupil contribution to 
the same point. The largest per cent of gain, 
55%, was produced on those items covered 
by study helps in the hands of the pupils 
when they read, and which were later dis- 
cussed by pupils during the recitation. 


There are of course many possible combi- 
nations of procedures not included in Table 
XXXIV for the reason that in the recorded 
lessons these combinations did not occur, or 
occurred so infrequently as to make it appear 
impractical to include them in our analysis. 


SO PAM rp ope 
eo 2 2 oe eee x 


— 





December,1945] PROCEDURES AND EDUCATIONAL OUTCOMES 


Insofar as the data from this study offer 
evidence it would seem: (1) that assigned 
reading, unguided by either study helps or 
teacher oral presentation before the study 
period, is the least effective procedure; (2) 
study helps add to the mastery of material 
read; (3) teacher discussion of facts later to 
be read about seems to be very effective in 
increasing the mastery of material, and (4) in 
question and answer type of recitations over 
textbook material it seems much more effec- 
tive to have the teacher ask questions which 
the pupils answer, than to have the teacher 
present the factual information herself. 


We shall turn now from an analysis of the 
data from the ten classes combined to data 
concerning the effectiveness of certain proce- 
dures as they were used in individual classes. 


It will be observed in Table XXXV, col- 
umn headed “Errors on Items Only Read 
About” that in every class many errors were 
made on test items that were neither dis- 
cussed by the class, nor called to the pupils’ 
attention in their study helps. So far as 
can be determined all information which 
pupils gained on such items came from their 
work during the study period, and this was 
largely a reading period. When it is recalled 
that the ten classes had been equated on the 
basis of reading ability and that the study 
periods were conducted in such a way that, 
as nearly as could be determined, every class 
had identical opportunities to learn, one might 
suppose that differences in the achievement of 
classes based on gains on test items only read 
about, would be very small. 


131 


A study of Table XX XV indicates that this 
was not the case. For some reason some 
classes mastered items only read about much 
more thoroughly than others. Thus class VI 
made 59 fewer errors in the second test on 
items only read about while class X made a 
gain of only 6. As a matter of fact, an anal- 
ysis of our data shows that classes varied 
much more in their achievement on items only 
read about than on items which were dis- 
cussed in class. Thus if we average the mean 
gain per pupil in each of the ten classes on 
items only read about we get 3.1 with a 
standard deviation of 2.1 which gives a coeffi- 
cient of variation of 67.7.7* On the other hand 
the average of the mean pupil gain in each 
class on items which “had been taught”, that 
is discussed in class, showed a mean of 9.2 
and a standard deviation of 1.5 which gives 
a coefficient of variation of only 16.5. The 
percent of errors that were corrected on items 
only read about is probably a better measure 
of the relative effectiveness of study in each 
class than the mean gain, as there were many 
more items studied in this way by some 
classes than by others. These per cents range 
from only 3 for class X to 39 for class V. 
The correlation between gains made in each 
class on items only read about and the mean 
pupil gain made in each class, which is our 
criterion score, is .78 while the correlation be- 
tween the gains made on items “taught” and 
the criterion score is .66. In other words the 
teachers of these ten classes, in spite of their 
great spread in professional standing, varied 
much less in the results of their 20 minutes 


14 Coefficient of variation =e. 


TABLE XXXV 


PUPIL ERRORS AND GAINS FOR DIFFERENT STUDY TECHNIQUES 


Gains 
on 
items 
only 

read 


Total Errors Errors Total 
Pretest on on ain 
yo — 
only taught 

read 


Gains Mean Mean Mean “Per 

on Pupil Pupil Pupil cent 

items Gainon Gainon Gainon of 

taught items items gain 
= taught on 


a") 
Bi 


gsBe 


items 


about items 


a 
RH CrACI#Sw© 
iS] 
an 
gs 
4S 
eo 


Sr neacorr- 

Oe | KH AWAMINOrAsL 
ED CO Tt et et BO 
se 


ee ee fio 
er | ADNOWOAMaan 





8B] © 
wo 
rece 


roto 
o®$| gesgeeeess 


one Oe & OAAawe wo 
at 
cea | MSSRSSEBS 


088 | coo 





132 JOURNAL OF EXPERIMENTAL EDUCATION 


of teaching during the discussion period than 
in the gains their classes made from 20 
minutes of reading. This is especially surpris- 
ing when we remember that the classes were 
equated on the basis of reading ability. The 
success of teachers, as measured by our cri- 
teria scores, apparently depended more on 
what happened during the study period, than 
on what happened during the discussion 
period. 

If further investigation should establish 
that the chief differences between “good” and 
“poor” teachers as measured by pupil gain 
can be traced to the effectiveness of the study 
period, it will help to explain the low correla- 
tions found between teacher and pupil activ- 
ity in the discussion period and total pupil 
gain. Likewise, if the results of this study 
were established by further research it would 
indicate that the critical procedure, so far as 
teaching success is concerned, is to properly 
set the stage for effective study. 

A careful investigation of the study sheets 
given the pupils, the teacher’s introduction 
and oral assignment, and the methods of 
study used by each pupil, have failed to make 
clear just why some of the classes apparently 
learned so much more effectively during the 
study period than others. 

A quantitative study of various aspects of 
the study period showed only three factors 
whose correlation with reading gain was sta- 
tistically significant. Of these the number of 
pupils who took notes had the highest cor- 
relation, P = .50. The second was the length 
of the notes taken, P = .36 and the third 
was the number of locations written in on 
outline maps, P —= —.28. This would suggest 
that notetaking in classes helps to fix factual 
information. It is probably to be expected 
that those classes spending the most time on 
locating places on outline maps should have 
made smaller gains from reading, which is 
indicated by the negative correlation. 

It will be recalled that one objective set up 
for the teaching of the lesson on Alaska was 
to have pupils learn the location of a group 
of places in Alaska. 

On the twenty-two places located on the 
wall map during the 10 recitation periods the 
pupils made 56 fewer mistakes in the imme- 


diate recall test than in the pretest or a gain. 


of 2.55 per place location. On the sixty-eight 
places not located on the wall maps (sum- 
mation of places not located in all classes) a 


[Vol. 14, No.2 


total gain of 96 was made, or an average of 
1.4 per place location. The average per pupil 
gain in place locations where outline maps 
were used was 2.4. The average per pupil gain 
on map locations where outline maps were 
not used was 1.1. It would thus seem that 
mastery of place locations is greatly aided by 
calling attention to locations on a wall 
during the discussion period. It also seems 
clear that the use of outline maps is very 
effective in fixing place locations as classes 
using them gained more than twice as much 
in that part of the test dealing with place 
locations as classes not using them. 


SUMMARY 


This section has attempted to determine 
the effectiveness of certain teaching proce- 
dures from a study of the gains made on test 
items taught by those procedures. It should 
be recognized that the small number of 
classes and pupils included in the study make 
the results little more than suggestive and 
that any generalizations regarding teaching 
under conditions other than those set up for 
this experiment would be unwarranted. 

From the analysis of the data it would seem 
that for the teaching conditions set up in this 
investigation: 


1. Greater gains are made on items dis- 
cussed in class than on items only read about. 

2. Teacher discussion of items before the 
study period causes greater gains than items 
only read about or read about with study 
helps. 

3. Teacher question followed by pupil 
answer produces greater gains than teacher 
presentation plus pupil additions. 

4. Teacher presentation before study 
added much more to the gain than teacher 
presentation after study. 

5. There was a much greater variation be- 
tween classes in gain on items only read about 
than gains made on items discussed and read 
about. 

6. Note taking by pupils as they study had 
a correlation of p = .50 with gain made on 
items only read about. 

7. There was approximately twice the gain 
made on place location items when those 
items were located on the wall map during 
discussion, or when an outline map was filled 
in during the study period, than when pupils 
were merely told to know the locations. 


This 
investi; 
cific © 
produc 
Knowl 
import: 
trainin 

Of t 
freque! 
were f 
appliec 
remair 
relatio 
(15 % 
sigmas 
slight] 
cients 
Only : 
statist 

In | 
teache 
per © 
cantly 
words 
Jearnii 
it, wa 
occurt 
meast 





December, 1945] PROCEDURES AND EDUCATIONAL OUTCOMES 


SECTION VI 


GENERAL SUMMARY AND 
CONCLUSIONS 


This study has as its major purpose the 
investigation of the relationship between spe- 
cific observable teacher acts and changes 
produced in pupils as measured by tests. 
Knowledge concerning this relationship is 
important both in the evaluation and the 
training of teachers. 

Of the 336 correlations found between the 
frequency scores and pupil gain 191 (577%) 
were found to be smaller than the correction 
applied for the shortness of the series, of the 
remaining correlations, 40 (12%) had cor- 
relations smaller than their sigmas and 50 
(15%) had correlations less than twice their 
sigmas; this left only 55 coefficients, or 
slightly over 16% of the total, with coeffi- 
cients more than twice their standard error. 
Only 20, or about 6% of the coefficients were 
statistically significant at the 1 per cent level. 

In this study no single specific observable 
teacher act was found whose frequency or 
per cent of occurrence was always signifi- 
cantly correlated with pupil gain. In other 
words no activity divorced from the particular 
jearning teaching situation which gave rise to 
it, was found whose frequency or per cent of 
occurrence might be used as a reliable 
measure of teacher effectiveness, measured by 
the conditions set up in this investigation. 

Several possible explanations may be sug- 
gested for these low correlations: 

First, the complexity of -the teaching act 
is so great, with its many varying and shift- 
ing factors, that a single observable activity 
divorced from others may not produce an 
effect measurable under present conditions. 
The increase in the size of the correlations 
obtained by combining several items would 
seem to support this thought. 

Second, it is possible that acts are not good 
or bad, in general, divorced from the situa- 
tions which give rise to them. The shifting of 
the correlations from negative to positive and 
vice versa, with the shifting of objectives 
would seem to support this thought. 

Third, it is possible that the relationship 
existing between the frequency of occurrence 
of observed activities and pupil gain is curvi- 
linear, rather than linear. That is, there may 
be an optimum frequency of occurrence for 
such activities, and the curve of effective- 


133 


ness would rise to this optimum and then fall 
again as the frequency went beyond. If this 
should prove to be true it would of course be 
expected that such a correlation technique as 
that employed in this study, based on the 
assumption of a linear relationship, would 
show low correlations. 

Fourth, it is possible, as evidence from the 
second investigation suggests, that difference 
in pupil gain may depend primarily upon 
varying factors inherent in the pupil or his 
activities as in a study period, rather than in 
teaching procedures. 

In general the following conclusions would 
seem to be warranted from the data pre- 
sented: 


First, supervisory ratings of teaching based 
upon transcriptions of teaching seem to lack 
reliability and validity. 

The lack of reliability and validity of 
ordinary supervisory ratings have frequently 
been pointed out. In order to increase the 
validity and reliability of supervisory evalu- 
ations it has been suggested that more com- 
plete and objective records might assist in the 
evaluations. While the collection of complete 
and objective information about a teaching 
situation is probably an essential first step 
toward the evaluation of teaching, the mere 
collection of this data does not in any way 
insure objective and valid evaluation. 

Second, there is, in general, little relation- 
ship between specific observable teacher acts 
and pupil gain criterion. 

In order to increase the objectivity of 
supervisory evaluations it has been suggested 
that the attention of the supervisor should be 
centered on specific, observable teacher acts 
rather than on vague and indefinite patterns 
of procedures, attitudes, and principles. Other 
studies have demonstrated that an activity 
analysis of teachers’ activities can be made 
with a fair degree of objectivity, but the val- 
idity of evaluations based on such an analysis 
of teachers’ activities has never been satisfac- 
torily established. To establish the validity of 
such an approach to the evaluation of teach- 
ing it must be demonstrated that there is a 
significant relationship, either positive or 
negative, between specific, observable, teacher 
activities and some acceptable criterion of 
teaching efficiency. 

This study seems to indicate that, in gen- 
eral, there is very little relationship between 
specific, observable teacher activities of the 





134 JOURNAL OF EXPERIMENTAL EDUCATION 


type analyzed, and criteria such as here used 
based on pupil gain. Not a single activity was 
found that had a statistically significant cor- 
relation with all of the pupil gain criteria. It 
would seem, from the results of this study, 
that the evaluation of teaching, as is fre- 
quently done, on the basis of the relative 
amount of teacher and pupil talk, the average 
length of pupil participations, the number of 
specified types of questions, teachers’ com- 
ments, etc., is without validity. 

Does the fact that there is little correlation 
between such activities as described and pupil 
gain, mean that it makes no difference what 
kind of questions or comments the teacher 
uses, or how little or how much she talks, 
etc.? 

The fact that teaching is a complex activ- 
ity, involving almost an infinite number of 

c activities, suggest that the contribu- 
tion that each of these activities may make to 
the final result will probably be small. Thus 
it is quite probable that many activities are 
pedagogically sound and worthwhile and yet 
their positive correlation with pupil gain 
would be small, and likewise some procedures 
which are actually bad in themselves may 
have such a small influence on the total result 
of the teaching situation as to leave little 
evidence in correlations obtained. 

Third, composite index scores may be 
formed which have a relatively high correla- 


[Vol. 14, No.2 


tion with certain of the pupil gain criterig 
scores. 

The correlations between the two com. 
posite index scores and certain of the pupil 
gain criteria were high enough (above .80) to 
suggest that the efficiency of a teacher jp 
securing a certain kind of pupil gain might be 
quite objectively and validly evaluated on the 
basis of such a score. Further study should 
be given to this phase of the problem as it js 
of great practical importance. If further ip. 
vestigation should demonstrate that such 
composite activity index scores as were devel. 
oped in this study could be validly used to 
evaluate the results of teaching in reaching 
certain goals, a most convenient and practical 
instrument for use under field conditions 
would have been developed. 

Fourth, teaching activities must be appro- 
priate to the objectives set up. 

This study presents consistent evidence 


that activities apparently closely related to 


pupil gain of one sort, in one study, had no 
apparent relation, or even a negative relation 
to pupil gain of a different sort in the other 
study. In other words this study emphasizes 
the appropriateness aspect of teacher activ- 
ities and that activities which are effective in 
one situation in reaching a certain objective 
may be ineffective or even detrimental in the 
reaching of different objectives in a different 
situation. 


The 
ther th 
will b 
questi 





THE IMPROVABILITY OF TEACHERS IN SERVICE 


C. R. Von EscHEN 
Beloit College 


SECTION I 
THE PROBLEM 


The purpose of this study is to explore fur- 
ther the nature of teaching ability. An attempt 
will be made to answer the three following 
questions: 


1. How effective is a particular supervisory 
program in producing measurable changes in 
pupils with respect to certain stated objec- 
tives? 

2. What is the relationship between pupil 
change in certain basic study skills and read- 
ing and pupil changes in seventh- and eighth- 
grade social studies? 

3. What changes in teachers seem to be 
most closely associated with teaching success 
where teaching success is defined in terms of 
measurable pupil changes? 


SECTION II 
PROCEDURES EMPLOYED 


GENERAL OVERVIEW OF THIS INVESTIGATION 


The investigation herein reported employed 
the experimental method with both single and 
equated groups. It relates to seventh and 
eighth grade teachers in one and two room 
rural schools of Dane and Columbia counties 
in the state of Wisconsin.’ The application of 
the equivalent-group method involved the 
selection of two groups: one to serve as a 
control group and the other as experimental 
group. Initial tests (IT) and final tests (FT) 
were given to each group and the change (C) 
observed. Supervision was introduced into the 
experimental group as the experimental 
factor. 

The control group was composed of go 
pupils in the seventh grades of 24 one and 
two-room rural schools. The data for this 
group were collected during the school year 


of 1937-38. The experimental group was 


made up of 104 vey pupils taught 


1See Carter Cod A. 
ee cthedetony of 


Barr, and Douglas E. Scates, 
ise Research (New York: D. 
leton—Century mony 1936); also William H. yma 
ire te Experiment in Education (New York: Macmillan Co., 


by the same teachers as the 1937-38 control 
group and for the school year of 1938-39. 

The single group part of this study con- 
sisted of the 90 pupils who were in grade 
seven in twenty-four schools in 1937-38 and 
in grade eight in the same schools and under 
the same teachers in 1938-39. The pupils 
were tested at both the beginning and end of 
the school years of 1937-38 and 1938-39 and 
the differences computed. By comparing the 
performance of pupils when in grade seven 
with their performance in grade eight it was 
thought that the relationship between the 
final test scores and the change scores should 
give some indication of the improvability of 
teachers and the effectiveness of the super- 
vision supplied could be inferred. 


CHARACTER OF TEACHERS PARTICIPATING IN 
THE INVESTIGATION 


The data for the control group in this in- 
vestigation were collected during the school 
year of 1937-38? and described on page 54. 

Fifty-seven teachers in one and two-room 
rural schools in southern Wisconsin counties 
participated in the 1937-38 study. Twenty- 
four of these were selected for the experi- 
mental group, using the following criteria: 


1. That they teach in the same school as 
during 1937-38. 

2. That they have classes in grades seven 
and eight—particularly grade seven. 

3. That they contemplate no major change 
in the materials of instruction and socio- 
physical environment of their school. 

. That they participate on a voluntary 
basis. 

. That their school be so located with 
reference to other schools selected as to 
facilitate frequent visits throughout the 
school year. 


Information concerning several factors re- 
lating to these participating teachers is sum- 
marized in Table I. Three of the teachers 
were men and 21 were women. Their chrono- 
logical age ranges from 22 to 55 years with 


eS ees ore © Sh ieetine ae 
collected by R. E. Gotham Carlson 


135 





JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


TABLE I 
Sex, AGE, TRAINING, EXPERIENCE, AND SALARY OF PARTICIPATING TEACHERS 


Training* 
beyond high 
school (yrs.) 


1+S 
1 
1+S 
1 
1 
1 
2+S 
1+S 
1 
1 
1 
2+S 
2+S 


2 
1+S 
1 
2 
1+S 
1 
1+S 
2+S 
2 
2 
4+S 


1.46+S 


1 At time of this study. 
2 Typically county normal school training. 
S Summer school. 


the mean at 33.8 years. These teachers are 
one year older than when they participated as 
teachers in the control group. 


They range in training beyond high school 
from 1 to 4 years, with added summer school. 
Only 2 of the 24 teachers attended summer 
school during the summer of 1938. The 
county normal school was the most common 
source of institutional training. The range in 
experience was from 4 to 31 years, with the 
average at 13.1 years. 


The monthly salary for the 1937-38 school 
year ranged from $80.00 to $145.00. The 
average monthly salary was $97.84. Only in 
two cases were there increases in salary for 
the 1938~39 school year, and these amounted 
to $5.00 per month. 


The typical teacher in this study may be’ 


described as a woman 34 years of age with 
1.5 years of training beyond high school plus 
some summer school work previous to the 
summer of 1938. She has taught 13 years and 
has a monthly salary of $98.00. 


Summer 


Teaching! 
School 


Experience 
(years) 


Monthly 
Salary (dol- 
lars) 1937-38 


CHARACTER OF THE SCHOOL PARTICIPATING 
IN THE INVESTIGATION 


The character of the schools in which these 
teachers taught is shown in Table II. Seven 
of the schools were two-room rural schools, 
and 17 were one-room rural schools. A two- 
room rural school is a school having two 
rooms and two teachers, one teacher teaching 
grades I through IV and the other teaching 
grades V through VIII. A one-room rural 
school is a school having one room and one 
teacher, who teaches all grades from I 
through VIII. 


The district per pupil valuation of these 
schools in 1937-38 ranged from $5,171.00 to 
$16,722.00. The average per pupil valuation 
for the school was $9,660.63. 


The per pupil school costs for 1937-38 
ranged from $31.00 to $164.00. The mean per 
pupil school cost was $56.75. 


There was no appreciable change in either 
the per pupil valuation or per pupil school 
cost for these schools in 1938-39. 





December, 1945) 


IMPROVABILITY OF TEACHERS IN SERVICE 


TABLE II 
ScHooL INFORMATION: TYPE, VALUATION, COST, AND ENROLLMENT 


Per Pupil Per Pupil 
School Schoo 

Valuation Cost 

1937-38 


$6063 
7576 
7242 
12210 
9857 
11704 
5171 
8341 
7189 
13889 
8333 
6676 
7660 
8450 
10304 
6871 


CWA Ce Odor 


56.75 


1 A one-room school is interpreted as a schoo! having one room and one teacher who teaches 
through VIII. A two-room school is a school having two rooms and two teachers, one teaching 


throu h IV, and the other 


es V through VIII. 
otal pupils taught 


‘ Pupils enrolled at beginning of year. 


CHARACTER OF THE ENROLLMENT OF THE 
ScHOOLS PARTICIPATING IN 
THE INVESTIGATION 


The character of the enrollment in the par- 
ticipating schools for both the control and 
experimental years is shown in Table II. The 
total enrollment for 1937—38 ranged from 18 
to 45 students with an average enrollment of 
28.7 pupils. For 1938-39 the total enrollment 
ranged from 12 to 48 students with a mean 
enrollment of 28.1. 


The average 1937-38 seventh-grade enroll- 
ment was 4.13 pupils, and the average 1938- 
39 seventh grade enrollment was 4.54. Corre- 
sponding enrollment averages for the eighth 
grade were 4.04 and 4.71. The 1937-38 
figures for both grades include only those 
students who remained in the particular 
school throughout the entire year. The 1938- 
39 figures are for pupils enrolled at the begin- 
ning of the school year. If both enrollments 


No. 
Pupils Pupils 
1937-38 Taught? 


y participating teacher. 
) + Only pupils who were fy in 1937-38 study. 


Enrollment 
1937-38 


Enrollment 
1938-39 


7th* 


No. 


P 
7th? 8th? Taught? 8th* 


Canr-~ 


7 
4 
3 
3 
3 
2 
0 
2 
5 
4 
1 
4 
3 
4 
2 
4 
9 
5 
3 
3 
4 
4 


DK WANON ACHE EAOAIIwMcn 


6 
12 


4.04 


DD OO 0 PDH DOC NAW WN OO m1 Dm OO 


_ 
ne 
BB AN AHMANIIHOHYV ON RMWNAIN ENS OOH DOO 


~ 
— 


4.13 28.1 4.54 


i 


were figured on the same basis, the slight 
difference, as indicated in the table, would 
disappear since more pupils were enrolled 
during the year in 1937-38 in grades seven 
and eight than were in these grades through- 
out the entire period of investigation. 


It is obvious that the class size is small for 
all schools, since the participating schools are 
one and two-room rural schools. For the par- 
ticular area of the curriculum investigated by 
this study, in both the control and experi- 
mental years, the schools followed the prac- 
tice of combining grades seven and eight. 


An examination of Table II and the fore- 
going description indicates that both school 
and enrollment factors are adequately con- 
trolled. 


Justification for carrying out this study and 
previously mentioned related investigations in 
one and two-room rural schools is found in 
the fact that 80% of the schools in the state 





138 


of Wisconsin are one-room schools.* The in- 
fluence too of other teachers and other factors 
are thus in part eliminated. 


COLLECTION OF DATA RELATIVE TO THE 
ConTROL GROUP 


Reference has been made to the fact that 
this study is one of a series of related investi- 
gations. The data for the control group used 
in this study were gathered as a part of an 
earlier investigation.™* 


COLLECTION OF DaTA RELATIVE TO THE 
EXPERIMENTAL GROUP 


The data for the experimental group were 
collected on seventh and eighth grade pupils 
during the 1938-39 school year. 

The procedure followed in the collection of 
data was (1) to administer a battery of tests 
designed to measure the general outcomes of 
the year’s course to all participating pupils at 
the beginning of the school year so as to 
obtain measures of long-time pupil change 
occurring over a six months period; (2) to 
apply appropriate pupil measures just prior 
to and at the completion of the teaching of 
each of two three-week units in the field of 
citizenship (one of the units, Unit I, related 
to safeguarding public health and was taught 
in the fall of 1938; and the other, Unit II, 
related to community planning and was 
taught in the spring of 1939); (3) to admin- 
ister an intelligence test, a socio-economic 
scale, and a reading test at the beginning of 
the year to secure information on the potency 
of these factors in relation to pupil change; 
and (4) to apply various measures of teach- 
ing ability, or of factors thought to be related 
to teaching ability, to the teachers themselves. 

By staggering the testing program over a 
period of time it was possible for the investi- 
gator and his assistants to administer all the 
pupil measures in person. The time schedule 
was so controlled that there was a six months 
period between six of the initial and the final 
long-time measures and a seven months period 
between two of the initial and final unit tests. 
The number and length of class periods was 
held constant. 

In the fall of 1938 and before the teaching 
of Unit I was begun, all pupils were admin- 

8 Biennial Survey beg I Vol. II (Washington, D. C.: 
United States Department terior, Government Printing 


. F. Rolfe: “Measurement of Teaching Ability: Study 
Num Two,” Journal of Experimental Education, (Septem- 
~~; 1945), pp. 52-74. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


istered a battery of tests designed to measure 
long-time changes in information, attitudes, 
basic study skills and reading, skill in organ- 
izing, and ability to apply generalizations to 
social studies events. Pupil intelligence ang 
socio-economic status were also measured.‘ 


The control group was measured in reading 
at the beginning of the school year only. This 
measure was applied to the experimental 
group at the beginning of the year and again 
seven months later. The test of basic study 
skills was applied only to the experimental 
group and was given in the fall and again 
seven months later. 


Immediately following the application of 
these tests, the supervisory program described 
at length in Section V was begun. 

Just prior to the teaching of Unit I, relat. 
ing to safe guarding public health, the test 
over this unit was administered to each class, 
Teaching on this unit continued for thirteen 
consecutive school days and on the fifteenth 
day the same test was administered as a final 
test. 


The same procedure was followed in testing 
Unit II, relating to community planning, 
taught in the spring. 

Shortly after teaching Unit II the long-time 
tests administered in the fall were readmin- 
istered to all pupils in the spring. 

At the time this was done in the spring 
four measures, described in Section IV, were 
administered to the participating teachers. 
They had been given these same tests along 
with several others during the school year of 
1937-38. These particular tests were selected 
for readministration because of the relation- 
ship which Rostker® found to exist between 
the areas measured by these tests and measur- 
able changes produced in pupils. The educa- 
tional objectives in his study and in this 
investigation were the same. 

He found that four of twenty-seven teacher 
measures which he correlated with a com- 
posite criterion of pupil change showed cor- 
relations ranging from .37 to .50. These 
measures covered the general areas of: (1) 
social attitudes, (2) knowledge of subject 
matter, (3) attitude toward teaching and the 
teaching profession, and (4) mental hygiene 
as related to teacher-pupil relationship. 


* All tests used will be described in Section III. 


ae E. fut, ‘Measurement of Teaching Al bility: 4 
” Journal of Experimental ‘ucation, 
y ay 1945), p. 44. 





December, 1945] 


Tests over these four areas were admin- 
istered in the spring of 1939 to the teachers 
in order to determine the effectiveness of that 
aspect of the supervisory Program which 
attempted to bring about improvement in 
these certain teacher qualities thought to be 
closely related to teaching success. 


SECTION III 


DESCRIPTION OF PUPIL AND 
TEACHER MEASURES EMPLOYED 
IN THE INVESTIGATION 


This section will describe the measures 
applied to the pupils and teachers who par- 
ticipated in the investigation. 


Pupit TEsts 


The tests applied to pupils fall into two 
classes: (1) those measuring factors condi- 
tioning learning, and (2) those measuring 
pupil change with reference to the objectives 
described in Sections II and IV. 


Measures of factors conditioning learning. 
Three measures of factors conditioning learn- 
ing were used: 1. Kuhlman Anderson Intelli- 
gence Test, Fourth Edition, Grades VI- 
VIII;** 2. Traxler Silent Reading Test, Form 
1, for Grades 7 to 10; and 3. Sém’s Score 
for Socio-Economic Status, Form C."* 


Measures of pupil achievement or change. 
The measures of pupil achievement were of 
two sorts: (1) those measuring short-time or 
unit change (applied at the beginning and 
end of a three-weeks period) and (2) those 
measuring changes in the general objectives 
of the course (administered at the beginning 
and end of the six months period). 

The tests for measuring short-time or unit 
change over the two three-week periods used 
in this study and related investigations were 
especially constructed for use in the series of 
studies. Two unit tests were employed: 
1. Unit I, Safeguarding Public Health, and 
2. Unit II, Community Planning.° 

%« Published by the Educational Test Bureau, Inc., Minne- 
apolis. For description of this test see L. E. Rostker, “Meas- 
urement of Teaching Abilit Study Number One,” Journal 
of Experimental Education, XIV (September, 1945), p. 12. 


_ & Published by the Public School Publishing Co., Bloom- 
ington, Illinois. For a description of this test, op. cit., p. 12. 

5¢ Published the Public School Publishing Co., Bloom- 
ington, Illinois. Fos a description of this test, op. cit., p. 12. 

*The preliminary forms of these tests were constructed in 
1936 at Ly State Teachers College, Whitewater, Wis- 
consin, by C. P Daggett and a class in Test Construction. 
For description of these tests, op. cit., p. 14. 


IMPROVABILITY OF TEACHERS IN SERVICE 


139 


The test items used in these two unit tests 
were carefully checked for curricular validity 
with materials used most widely in Wisconsin 
Schools. 

Two batteries of tests were used to measure 
long-time change—a series of three tests de- 
veloped by J. Wayne Wrightstone and a 
series of three measures developed by Howard 
C. Hill. These tests were described by 
Rostker.** 


The remaining test, measuring long-time 
change, used in this study was The 1938 
Iowa Every-Pupil Tests of Basic Skills, for 
Grades 6, 7, and 8, Test B: Vocabulary, 
Basic Study Skills. This test is organized 
into six parts concerned with: (1) general 
vocabulary, (2) comprehension of maps, (3) 
reading of graphs, charts, and tables, (4) use 
of basic reference material, (5) use of the 
index, and (6) use of the dictionary. 

In the Manual for Administration and In- 
terpretation of this test the authors report 
that experience shows that independent tests 
calling for responses to as many as 75 to 100 
items in 10 or 15 minute working period will 
uniformly produce statistical reliabilities of 
.go or above. For this reason and because of 
the unusual length and extensity of the 
sample provided in the Iowa Every Pupil 
Test of Basic Skills, no statistical evidence 
of the reliabilities is deemed necessary. 

In order to facilitate further discussion in 
this report, the various pupil tests will be 
designated by the following symbols: 


Pupil Test 
Unit ne Public Health 
Us Unit Il—Community Planning 
W,’__ Abilities to Organize Research Material 
Ww Scale of Civic Belief 





Applying Generalizations to Social 

tudies Events 

A Test of Civic Attitude 

A Test of Civic Information 

A Test of Civic Action 
TR Traxler Silent Reading Test 

BS___Iowa Every-Pupil Test of Basic Skills 








The reliabilities of the tests used to 
measure pupil achievement are reported in 
Table III. Reliability coefficients are reported 
for (1) initial test and (2) final test for both 

* Op. cit., pp. 12-14. 


*> (Iowa City, Iowa: Bureau of Educational Research and 
Service, University of I ) 


7™W,, W, and W, tests were constructed by J. W. Wright- 
stone; the H,, H, and H, tests were constructed by H. C. 
Hill. 





140 


the experimental and control years."* The 
reliabilities of the 1937-38 tests were ob- 
tained by taking 150 cases from the total 
group of 404 cases, dividing the initial and 
final tests into odd and even halves, correlat- 
ing these halves for a particular test and cor- 
recting the obtained coefficient by the 
Spearman—Brown formula. 

The reliabilities for the 1938-39 tests were 
obtained by using the total group of 194 
cases. The initial and fina] tests were divided 
into odd and even halves. These halves on 
each particular test were correlated and the 
obtained coefficient corrected by the Spear- 
man—Brown formula. 


TABLE III 
RELIABILITIES OF PuPIL TESTS 


1938-39 
Experimental 


Year 
Initial Final 


1937-1938 
Control 


r 
Final 


Yea 
Initial 


SeBesesse 


TEACHER TESTS 


Thirteen measures were applied to the 
teachers who participated as members of the 
control group in 1937—38.° Four of these 
thirteen measures were readministered to the 
twenty-four teachers who participated during 
the experimental year (1938-39). These four 
tests were:** 


1. The American Council Civics and Gov- 
ernment Test—Form B. 

2. Social Attitudes of Secondary-School 
Teachers.® 

3. A Test of Teacher-Pupil Relationship. 

4. Scale for Measuring Attitude Toward 
Teaching and the Teaching Profession. 


7 Reliabilities for Control Year were supplied by Lee 

—- unpublished Ph D. Thesis University of Wisconsin, 
5. 

See unpublished Ph.D. thesis on file in the library, Uni- 
versity of Wisconsin. 

Se For a description of these tests see L. E. Rostker, ‘“Meas- 
urement of Teaching Ability: Study Number One,’’ Journal 
of Experimental Education, XIV (September, 1945), pp. 16, 
17, and 19. 

® Designed for the National Survey of Junior High School 
and Senior High School Teachers under auspices of John 
Dewey Society. The test is described in The Teacher and 
Sa ete Va ee oa Lgl esas 

’ a y permission 
of the Dr. G. W. Hartman, author. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


The reliabilities of these four teacher 
measures as found by Mathews” in. his study 
involving fifty-seven teachers is reported in 
Table IV. 


TABLE IV 
RELIABILITY OF Four TEACHER MEASURES 


Teacher Measures Reliability 
American Council Civics and Govern- 
ment 87 
Social Attitudes of Secondary 
Teachers 94 
Test of Teacher-Pupil Relationship.__—-.89 


Yeager Attitude Toward Teaching 
(Interest) .72 








SECTION IV 
THE SUPERVISORY PROGRAM 


PRINCIPLES OF SUPERVISION 


The supervisory program employed in this 
investigation was planned with reference to 
certain principles held to be basic to a good 
supervisory program. These principles are the 
general rules or basic concepts by which one 
proceeds from one concrete situation to an- 
other. Barr, Burton, and Brueckner™ point 
out that principles are enormously important 
in governing action, and in the control of 
technique. The basic principles considered of 
great importance in planning the supervisory 
program carried out in this investigation were 
as follows:** 


1. The purpose of supervision is to im- 
prove the learning of pupils. 

2. The ultimate criteria by which the 
effectiveness of supervisory programs may be 
measured are the changes produced in pupils 
when measured with reference to the objec- 
tives to be attained. 

3. Good supervision is democratic in that it: 


a. respects the personality of teachers and 
pupils. 

b. deals sympathetically with the human 
element in supervision, teaching, and 
learning. 

2 Lee Howard Mathews, ities Associated with Teaching 

ae Unpublished Ph.D. Thesis, University of Wiscon- 


“A. S. Barr, William H. Burton, and Leo J. Brueckner, 
Supervision 


rity (New York: D. Appleton—Century Company, 


% The author is indebted to A. S. Barr, William H. Burton, 
and Leo J. Brueckner for the original statement of principles 
from which this adaptation is made. 





December, 1945] 


c. provides opportunity for and encourages 
freedom of expression and action in 
planning and other matters of instruc- 
tional policies. 

. emphasizes the cooperative character of 
supervision where-in teacher, pupil, and 
supervisors all work together harmoni- 
ously for the improvement of instruction. 


4. Good supervision is goal-centered and 
P' 5 
5. The objectives of the supervisory pro- 

should arise from a study of the needs 
of the particular situation. 

6. A problem-consciousness on the part of 
the teacher is a pre-requisite for learning to 
teach. 

7. Good supervision recognizes individual 
differences in teachers and situations and 
makes provisions for these in the supervisory 


program. 

8. Teachers should have definite informa- 
tion as to the progress they are making in 
instructing their pupils and in the improve- 
ment of instructional procedures. 

9. The exposure of teachers to desirable 
experiences may or may not produce desir- 
able changes. The changes must be proven. 

10. The teacher’s attitude is of primary 
importance to the effectiveness of the super- 
visory program. She must: 


a. be intrinsically interested in the super- 
visory program. 
. be personally convinced of the value of 
the educational objectives to be attained. 
. participate voluntarily, if possible, in 
the program set up. 
. have confidence in the ability and effort 
of the supervisor. 
. feel that she has a vital role to play in 
the planning and execution of the 
supervisory program. 


11. Good supervision is well planned. 
12. The supervisory aids must: 


a. be related as closely as possible to the 
existing programs of instruction. 

b. recognize that the participating teachers 
have recognized responsibilities to their 
administrative superiors. 

c. make use of the resources of both school 
and community such as they may be. 


IMPROVABILITY OF TEACHERS IN SERVICE 


Tue SUPERVISORY PLAN 


The general pattern of this study was 
somewhat determined, as has already been 
pointed out, by a series of related studies. 
Hence, the type of the supervisory program 
was also in part determined by the general 
objectives set up for the year 1937-38 and 
the measuring instruments employed in deter- 
mining the degree to which these objectives 
were attained during this school year. 

Specific plans for the supervisory program 
here pursued were made during the summer 
of 1938. A careful study was made of the 
general objectives as accepted by the teachers 
for the control year.** The pupil tests used 
to measure these objectives were also care- 
fully studied.** The broad objectives were 
then broken down into specific objectives and 
a bulletin, Bulletin No. I, Objectives Meas- 
ured by Various Pupil Tests,>* was prepared. 

A number of studies by Burton,’* Case,’* 
Mcallister,* Salisbury, Simpson,” and 
Smith" support the assumption that effi- 
ciency in study-type reading skills and basic 
study skill will increase performance in con- 
tent subject areas, ability to reason and the 
understanding of logical relationship. Gray** 
states that “there is a fair degree of positive 
relationship between reading achievement and 
class marks in all subjects”. Horn** reports 
that “a number of investigations have shown 
that at every level from the elementary school 
to college it is possible to increase greatly 
general reading ability of students and that 
this ability is reflected in higher scholarship”. 

In light of these statements and the sup- 
porting evidence, it was decided to incorpo- 


18 See F. Rolfe, “The Measurement of See: 
Study mber Two,” Journal of Experimental 'ucation 
XIV (September, 1945), p. 53. 

14 See unpublished Ph.D. thesis on file in the University of 
Wisconsin brary. 

%5 See unpublished Ph.D. thesis on file in the University of 
Wisconsin library. 

%°W. A. Burton, Outlining as a Study Procedure, Contribu- 
tions to Education No. 411 (New York: Teachers College, 
Columbia yy 2 1930). 


17 Luther Case, Standardized Tests for M. Com- 
hension Reading, Sar. 


7-1 

Salisbury, “Some Effects 
Journal, Vol. XXIV, 
Si "The Effect °o 


of 


Fi tg in Outlin- 


to Read Historical 
po Og Vol. XX (December, 1929), 
“iW. S. Smith, “Correlation of Abi 
General Grades in High 


School,” School Review, 


PP. 493-S1L. 
Fm: i “ae Related to Read- 


: Scribners, 1937), p. Ol 





142 


rate in the supervisory program a program for 
the development of reading and basic study 
skills, in the hope that improvement in these 
areas would bring about some improvement 
in the areas under investigation. 


In order to measure change in reading skill, 
the Traxler Reading test was administered to 
the pupils in the fall and again seven months 
later. The Iowa Every-Pupil Test of Basic 
Study Skills, selected to measure basic study 
skills, was administered at the same time. 


Horn™ points out that learning, to be 
effective, must be definitely directed through 
actual practice lessons specifically set up to 
improve definite skills and that “the evidence 
indicates that the more closely exercises in 
reading are integrated with the fields in 
which reading is used the more beneficial the 
result will be”. An attempt was made through 
the preparation of a supervisory bulletin 
Practical Helps for Improving Instruction 
and Pupil Achievement (Bulletin No. II) 
to provide a close tie-up between the de- 
velopment of specific reading and study 
skills and the social science material found 
in the participating schools. The bulletin 


presents a series of illustrative exercises 
typical of those which may be used to develop 


basic reading and study skills. Teachers were 
encouraged to develop more of the specific 
type which they found were needed. These 
exercises are built upon material found in 
four social science texts in actual use in the 
participating schools. They are not of the 
“extra work” type but are integrated with 
the regular instruction of the social studies. 


The bulletin was to be used for remedial 
teaching based on the results of pupil per- 
formance on the initial test of reading ability 
and on the initial test of basic study skills. 
These tests were given in the fall and the 
results on these tests for each pupil were 
made immediately available to the participat- 
ing teachers in a section of Bulletin No. III, 
Report of Pupil Performance.”* Standards on 
the parts of the tests were provided so that 
the teachers would have some indication of 
the strengths and weaknesses of the indi- 
vidual pupils in the various skills measured. 


In addition to Bulletin No. II and Bulletin 
No. III, referred to previously, four other 


% See uni Ph.D. thesis on file in the University of 
Wisconsin library, p. 201. 
% See thesis on file in the University Library, Uni- 


versity of Wisconsin. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


supervisory bulletins were prepared and dis- 
tributed to the participating teachers. 

Bulletin No. I, Objectives Measured by 
Various Pupil Tests** presents a list of the 
specific educational objectives of the super- 
visory program. These objectives are pre- 
sented in relation to the instruments purport- 
ing to measure them. 


Bulletin No. IV, Generalizations in Social 
Studies?" discusses the meaning of a “gener- 
alization” and the importance of teaching 
pupils how to make and apply generalizations. 
It presents a list of generalizations which 
have been built up by social studies author- 
ities and offers suggestions which will aid 
teachers in teaching pupils how to make and 
apply generalizations. 

Bulletin No. V, Suggestions for the Im- 
provement of Personality** is designed to offer 
teachers a practical plan for the improvement 
of personality as it relates to teaching. The 
plan as presented is based upon suggestions 
made by authorities in the field of personality 
development. 

Bulletin No. VI, Reading for Improvement 
in the Research Project®* lists books which 
might be read by teachers for their improve- 
ment in certain areas thought to be related 
to teaching success. 

In order to give careful direction to the 
supervisory program, a schedule of school 
visitation was drawn up together with a list 
of specific activities to be carried out during 
each visitation. Because of the assistance 
given the investigator in the routine matters 
of test administration, scoring, and necessary 
clerical work, it was possible to carry the pro- 
gram through with very little deviation from 
this schedule. Thus, the schedule to be pre- 
sented is typical of the general program as 
carried out in each participating school. It 
should be pointed out that this uniform gen- 
eral program did not preclude an individu- 
alized program within each school. The gen- 
eral schedule follows: 

First visit.—An initial visit was made to all 
teachers and schools chosen as suitable for 
the experimental group. The general program 
was outlined and their cooperation invited. 


2 See sfed Gade on Sie fe the Unienty Lien, Uni- 
versity of Wisconsi 

27 See original ‘ats on file in the University Library, Uni- 
versity of W Wisconsin. 

28 See inal thesis on file in the University Library, Uni- 
versity of Wisconsin. 

® See original thesis on file in the University Library, Uni- 
versity of Wisconsin. 





December, 1945) 


Thirty teachers were visited. Six of these de- 
clined to participate. Twenty-four agreed to 
participate for a second year. Of this group 
one was hesitant due to a community attitude 
which had grown up concerning the previous 
year’s program. She, however, consented to 
participate in light of what she thought 

red to be a more “practical” program. 
It was evident in the case of three others who 
agreed to participate that their attitude was 
not one of cooperation. 

Second visit —1. Administered initia] Trax- 
ler Reading Test (TR) and Iowa Basic Study 
Skills Test (BS). 

2. Discussed the statement of general 
objectives for the year’s course in Citizenship 
which teachers had accepted for the previous 
year” (about 75% of the teachers did not 
recall these objectives or otherwise seem to 
know what the investigator meant when he 
discussed objectives). 

3. Gave teachers Bulletin No. I, Objectives 
Measured by Various Pupil Tests. 

4. Discussed in detail the specific objec- 
tives set up for improvement in reading, basic 
study skills, and abilities to organize research 
material.** 

5. Gave teachers Bulletin No. II, Practical 
Helps for Improving Instruction and Pupil 
Achievement. Showed them the general organ- 
ization of the bulletin and suggested that they 
examine it carefully and critically before next 
visit. Suggested that they make any use they 
cared to of the material in the bulletin. The 
initial test over these objectives was given 
during this visit. 

6. Informed teachers that on next visit 
they would be given the test results of their 
pupils on: 

a. the initial reading test given the 
seventh- and eighth-grade classes on 
this visit. 

b. the initial basic study skills test given 
the same pupils on this visit. 

c. the initial, final, and change scores on 
all tests given last year to the last year’s 
seventh-grade class (this year’s eighth 
grade class). 


Third visit—1. Administered Kuhlman- 
Anderson Intelligence Test, Abilities to 
Organize Research Material (W,) and Scale 
of Civics Beliefs (W,). 

Pas SE TT Ee 
XIV (September, 1945), p. 8. ws 


= See inal thesis file in the University Library, Uni- 
versity of Wisconsin. ils init ™ ” _ 


Ability: 
ion, 


IMPROVABILITY OF TEACHERS IN SERVICE 


143 


2. Gave teachers Bulletin No. III, Report 
of Pupil Performance. 

3. Examined the pupil scores (reported in 
Bulletin No. III) on the initia) TR and BS 
tests. Gave teachers a set of norms for the 
various parts of these tests. Noted, with them, 
the strength and weaknesses of individual 
pupils. Discussed with them procedures and 
plans for individualizing instruction so as to 
improve these weaknesses. Pointed out how 
Bulletin No. II provides typical exercises for 
developing specific skill. (Only about 25% 
of the teachers had approached the improve- 
ment of instruction from this diagnostic and 
remedial point of view). 

4. Requested that teachers examine care- 
fully Bulletin No. I before the next visit. 
Suggested that they begin work toward those 
objectives measured by W,. (They had 
already begun on TR, BS, and W,). 

5. Gave teachers a copy of the objectives 
for Assignment No. I: Unit I, Safe-Guarding 
Public Health* Discussed these objectives in 
detail. Discussed the organization of subject 
matter and the materials of instruction for 
teaching the unit—drew on experience which 
teachers had with the unit the previous year. 
Suggested that teachers begin the collection 
of materials and plan the organization of sub- 
ject matter for teaching the unit. 

Fourth visit—1. Administered Applying 
Generalizations to Social Studies Events 
(W,), Test of Civic Attitude (H,), Test of 
Civic Information (H,), and Test of Civic 
Action (H,). 

2. Discussed at length the specific objec- 
tives covered by W., W,, H,, H,, and H,.™* 

3. Examined with the teacher in much de- 
tail Bulletin No. III giving the performance 
of her previous year’s seventh-grade pupils 
(now eighth graders) on all the tests given 
the preceding year (1937-38). These same 
tests were used during the experimental year. 
Bulletin No. III reports the initial, final, and 
change scores on all tests for each seventh 
grade pupil enrolled throughout 1937-38. 
Initial scores for 1938-39 TR and BS are 
also reported for the seventh- and eighth-grade 
classes. Negative change scores indicate that 
a pupil actually lost in raw score during the 
instructional interval between the initial and 
final administration of a particular test. (The 

ith whi ese negative scores 
Male ei rel 1-7  - Ualventty Library, Uni- 


inal thesis on file in the University Library, Uni- 


versity of Wisconsin. 





144 


appeared in some schools was a source of 
much surprise and disturbance. Typical of 
the teacher remarks was, “Do you mean that 
he actually knew more about this test at the 
beginning of the year than he did at the end? 
Why! how can that be?” The rather small 
gains and frequent negative change scores 
served as a motivating influence in some 
cases.) 

Fifth visit —1. Administered initial Unit I, 
Safeguarding Public Health Test (U,). 

2. Gave teacher Bulletin No. IV, General- 
izations in Social Studies Events. (The area 
covered by this bulletin was a source of 
trouble to most of the teachers and was pre- 
pared in the hope that it would aid them in 
teaching pupils how to make and apply gen- 
eralizations in the field of social studies.) 

3. Discussed with teachers in detail the 
problem of teaching pupils how to make and 
apply generalizations in the field of social 
studies. Used Bulletin No. IV as a point of 
departure. 

Sixth visit —1. Administered final Unit II, 
Safe Guarding Public Health Test (U,). 

2. Gave teachers Bulletin No. V, Sugges- 
tions for Improving Personality. (At the time 
this bulletin was developed it was thought 
that by late in the fall of 1938 some objective 
data collected during 1937-38 on the per- 
sonality of the participating teachers would 
be available and that this information might 
be used in improving the personality of 
teachers during 1938-39. However, these data 
were not then available. It later developed 
that those instruments used to measure per- 
sonality did not discriminate between good 
and poor teachers. )** 

3- Suggested to teachers that they attempt 
to improve any personality traits which they 
felt were a detriment to their teaching. 

4. Discussed with teachers those areas of 
teaching ability which Rostker found to be 
most closely related to teaching success.** 
(The teachers in this study were administered 
in 1937-38 the same tests over these same 
areas as were those in Rostker’s investiga- 
tion.) 

5. Gave teachers Bulletin No. VI, Reading 
for Improvement in the Research Project and 
discussed with them the possibility of im- 


pro their ability in those areas related to 


Mathews, Qualities Associated with Teaching 
Eficiecy, Unpublished Ph.D. Thesis, University of Wisconsin, 
SL. E. Rostker, Pe am a ) oe. T 


Caz: Sety 
‘umber One,” Journal 
(September, 1945), p. 44. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


teaching success by reading from the materia] 
suggested in Bulletin No. VI. Discussed the 
type of subject matter covered in each of the 
five books. (The 1938 scores of the partici- 
pating teachers were not available in the 
1938-39 school year. This precluded an indi- 
vidualized program for the improvement of 
teacher qualities.) 

Seventh visit—1. Discussed any problems 
relating to the supervisory program which the 
teachers suggested. (This was always done, 
but in this visit the investigator had fewer 
specific areas which he wished to discuss.) 

2. Gave teachers a copy of the objectives 
for Assignment No. II: Unit II, Community 
Planning Unit** Discussed these objectives 
in detail. Discussed the organization of sub- 
ject matter and the materials of instruction 
for teaching the unit—drew on experience 
which teachers had with the unit the previous 
year. (It was discovered that there was a uni- 
versal lack of instructional materials for 
teaching this unit.) Suggested that teachers 
begin the collection of materials and plan the 
organization of subject matter for teaching 
the unit. 

3. Arranged the dates for teaching Unit II. 

Eighth visit—1. Administered the initial 
test of Unit II, Community Planning Unit 
(U,). 

2. Discussed any problems relating to the 
supervisory program which the teachers sug- 

ted. 


Ninth visit —1. Administered the final test 
of Unit II, Community Planning Unit (U,). 

2. Discussed any problem relating to the 
supervisory program which the teachers sug- 
gested. 

Tenth visit—Administered final tests of 
Abilities to Organize Research Material (W,) 
and Applying Generalizations to Social 
Studies Events (W,). 

Eleventh visit —Administered fina] tests of 
Scale of Civic Beliefs (W,), A Test of Civic 
Attitudes (H,), A Test of Civic Information 
(H,), and A Test of Civic Action (H;). 

Twelfth visit—Administered final tests of 
Traxler Silent Reading (TR) and Iowa Test 
of Basic Study Skills (BS). 


SUPERVISION INDIVIDUALIZED 


It must not be inferred from the foregoing 
schedule of supervisory activities that the 
supervision was standardized. The schedule 


% See original thesis on file in the University Library, Uni- 
versity of Wisconsin. 





December, 1945] 


was internally flexible and, since each teacher 
was visited individually, the supervision was 
individualized on the basis of the problems, 
needs, and personal idiosyncrasies of the par- 
ticipating teachers. 

A few illustrations will be given of typical 
factors which the investigator observed and 
which determined the need for and directiqn 
of this individualization. 

1. Teachers varied in the degree to which 
they understood the general application of the 
scientific method to educational measurement. 
Several had no conception whatever of this 
approach. 

2. Several teachers were unfamiliar with 
the purpose and use of general and specific 
educational objectives. 

3. Several teachers did not have an under- 
standing of the use of initial and final tests 
as a means of determining pupil change. 

4. Three teachers were unfamiliar with 
such terms as “standards”, “means”, “norms” 
and “skills”. 

5. Two teachers immediately caught the 
spirit of the supervisory program and the in- 
vestigation and went far beyond the average 
teacher in planning their instruction with 
reference to the objectives. 

6. One teacher did not know what a unit 
was. 
7. Several teachers were unfamiliar with 
the use of several books for developing a unit 
of work. 

8. Several teachers were unfamiliar with 
the use of tests for diagnostic purposes. 

g. One teacher insisted that she had done 
a good “job” in teaching Units I and II dur- 
ing 1937-38 even though many of her pupils 
showed negative change scores. 

10. One teacher failed to realize that she 
was unsuccessful in her teacher-pupil rela- 
tionships. 

Such matters as these and the many others 
of which they are typical necessitated an indi- 
vidualized program of help. 


? 


SECTION V 


ANALYSIS AND INTERPRETATION 
OF DATA 


EQuaTING GROUPS 
Since this investigation was of the equiv- 
alent group type it was necessary first to 
establish the comparability of the control and 
experimental groups. This was done for those 


IMPROVABILITY OF TEACHERS IN SERVICE 


145 


factors which appeared to be responsible for 
the change occurring between the initial and 
final pupil tests in each of the eight areas. It 
might be assumed that a number of factors 
are responsible for these changes. Among those 
traditionally considered are such factors as 
general intelligence in terms of point scores 
or of mental age, chronological age, socio-eco- 
nomic status, and reading ability. However, it 
appears that some objective evidence should 
be presented to indicate which factors were 
potent for this particular investigation. 

Some indication of the potency of several 
factors likely to be related to pupil change 
may be found by the chi square test. Garrett*’ 
states that the chi square test is “often useful 
for testing whether certain experimentally 
obtained results differ significantly from those 
to be expected by chance; or whether ob- 
tained results agree or disagree with the find- 
ings to be expected by some other hypothesis 
. . . it provides a measure of the probability 
that two sets of data are dependent (defi- 
nitely associated) or are independent (sig- 
nificantly different) .” 

The medians of initial scores and the 
change scores on the eight pupil measures for 
the combined control and experimental groups 
are presented in Table V. Since the Kuhlman 
Anderson Intelligence Test, Traxler Silent 
Reading Test, and Sim’s Score Card of Socio- 
Economic Status were administered only once, 
no median change scores are reported for 
these tests.** The results of the chi square 
test made between each of twelve factors and 
the pupil change on each of eight pupil 
measures is found in Table VI.*° 

Since Hollerith machines were used in 
analyzing the data, it was necessary to con- 
vert all negative change scores into positive 


** Henry E. Garrett, Statistics in Psychology and Education 
(New York: Longmans, Green and Company, 1939), p. 377. 
38 Change scores were computed by the formula: 
Change Score = (Final Test Score — Initial Test Score) 
*®G. H. Goulden, Methods of Statistical Analysis (Minne- 
apolis: Burgess Printing Company, 1937), presents the follow- 
ing formula by which the — Ly — calculated: 
— bc)?. 


Chi Square (X*) wT Kea ee 


in which 
yr of subjects above median in both factors A 
an 


6=number of subjects above median in factor A and 
below median in factor B 

¢=number of subjects below median in factor A and 
above median in factor B 

a » of subjects below median both in factors A 
an 


T,=sum of a +c 
T,=sum of +d 
T,=sum of c +d 
T,=sum of a + b 
T=sum of T, + T, or T, + T, 





146 


scores. This was done by adding certain con- 
stants to each of the original change scores 
for each subject for each of the pupil 
measures. 


TABLE V 


MEDIANS OF INITIAL TEST AND CHANGE 
ScoRES FOR COMBINED CONTROL AND 
EXPERIMENTAL GROUPS 


Median 
Initial 


Median 
Change 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 74, No. 2 


initial score is 60 will have an original change 
score of —10. When the constant is added, 
the change score becomes 20. The median 
change score in Table V for U, is an original 
change score of 6.4. The introduction of these 
constants does not change the relative value 
of the change scores. 

Garrett® states that a P of .o2 or less indi- 
cates that the factors under consideration are 
dependent or definitely associated. With n — 
1, the X* value of a P of .o2 is 5.412. Thus 
a X? value in Table VI which is 5.412 or 
more indicates the association of the factors 
under consideration. 

The only factor, as shown by Table VI 
which meets this test is the initial test score. 
The factors of mental age, intelligence quo- 
tient, reading ability, and socio-economic 
status as measured by tests used in this study 
were not significantly related to the changes 
made by pupils. Initial test scores on a spe- 


TABLE VI 
TABLE OF CHI SQUARES 
(Combined Control and Experimental Groups) 


2.15 


The following constants were used: 


Pupil Measure Constant 


Thus a subject whose final score on U, is 
70 and whose initial score is 60 has an orig- 
inal change score of 10. When the constant 
of 30 is added, the change score becomes 40. 
A subject whose final score is 50 and whose 


W2 


41 37. 
87 32. 


cific pupil measure were not related to the 
change scores on any other unit. 

Further validation of this relationship is 
found in certain of the correlations presented 
in Table VII. Neither mental age, intelligence 
quotient, nor initial reading ability correlate 
significantly with pupil change. The correla- 
tions between these factors and the eight 
pupil measures of change range from —.o9 
to .11. The correlations between the initial 
test score and the change on the pupil 
measures for the combined control and ex- 
perimental groups range from —.37 to —.83. 
In addition to confirming the chi square test 
showing initial test score to be significantly 


E. Garret, Statistics in Psychology and Education 
reen and Company, 1939). 


# Henry 
(New York: Longman, G 





December, 1945] 


IMPROVABILITY OF TEACHERS IN SERVICE 


TABLE VII 


CORRELATION OF M. A.; I. ei INITIAL READING | mes. AND INITIAL TEsT SCORES 
ITH PUPIL CHANGE SCORES 


(Combined Control and Experimental Groups) 


U: Wi We: Ws H) He 
Change Change Change Change Change Change 


—.05 -06 
—.038 -05 
— .08 -08 


—.76 —.87 


* For experimental group only. 


TABLE VIII 


MEANS AND STANDARD DEVIATIONS FOR 
CoNTROL GROUP ON ALL PUPIL 
MEASURES 


Standard 
Deviation 


9. 
3. 
2. 
2. 
3. 
3. 
3. 
2. 
2. 


related to change on a particular pupil 
measure, these correlations provide informa- 
tion as to the direction of this relationship. 
Each correlation is negative indicating that 
the higher the initial test score the lower the 
change. This change is independent of mental 
age, intelligence, and reading ability.** 

From both the chi square test and the cor- 
relations it appears that the initial test score 


unsolved problem 
t This to 


—.00 -09 
—.08 -05 
-02 


-10 
—.42 


is the factor to be given dominant considera- 
tion in equating the contro] and experimental 
groups. 

The means and sigmas for the final, initial, 
and change measures on all pupil tests and 
means and sigmas for mental age, intelligence 
quotient, reading, and socio-economic status 
for the control group are reported in Table 
VIII. 

The same information for the experimental 
group with the addition of initial and change 
means and sigmas on TR and BS is reported 
in Table IX. These measures were not used 
during the control year.*” 

To provide a certain margin of safety the 
equivalence of the control and experimental 
groups was established by calculating the 
significance of the difference between the 
obtained means of the two groups on mental 
age, intelligence quotient, reading, as well as 
on initial test score. It has already been de- 
termined that the initial test scores correlated 
with change scores. The critical ratios on the 
equating of the control and experimental 
groups on the factors of mental age, intelli- 
gence quotient, reading ability, and initial 
test scores are reported in Table X. The crit- 
ical ratios are below 3 in all factors, the 
highest critical ratio is in reading ability, but 
it has already been established that reading 
ability is not significantly related to change. 
The groups are well equated on all eight 
initial pupil measures. 

The information reported in Table II, 
relative to total school enrollment and class 
size for the control and experimental years 
indicate the comparability of the groups in 
these factors. Other factors are reported in 
Table XI. There was relatively little change 
made in the equipment and instructional sup- 
plies with which these schools were provided 


© See inal thesis on file in the University Library, Uni- 
versity of Wisconsin. 





148 JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE IX 


MEANS AND STANDARD DEVIATIONS FOR 
EXPERIMENTAL GROUP ON ALL 
MEASURES 


Standard 
Deviation 


-Q 
Traxler Final 
a 


Ui 


U2 


[Vol. 14, No.2 


during the experimental and control years, 
Only school No. 14 reported any appreciable 
additions of instructional materials. 


PERFORMANCE OF THE CONTROL AND ExPERI- 
MENTAL GrRouUPS: SEVENTH GRADE 
(1937-1938) WiTH SEVENTH 
GRADE (1938-1939) 


The critical ratios arising from the com- 
parison of the pupil changes made by the 
control and experimental groups on the eight 
pupil measures are presented in Table XII. 
If a critical ratio of 3 is accepted as indicative 
of a significant difference (virtual certainty), 
two of the differences (W, and W,) are sta- 
tistically significant. If we accept a critical 
ratio of 2 as significant, then these two and 
two others (U, and H,) are statistically sig- 
nificant. There are 98 chances in 100 that the 
U, is significantly different and 99.5 chances 
in 100 that the obtained difference in H, is 
significant. All of these differences favor the 
experimental group. The critical ratios in the 
remaining four areas (U,, W,, H,, and H,) 
have critical ratios below 1 and are not sig- 
nificantly different. 


On W, (Applying Generalizations to Social 
Studies Events) the experimental group was 
superior to the controls by a mean change of 
7.16. The critical ratio was 4.75. 


TABLE X 


EQUATING OF CONTROL AND EXPERIMENTAL GROUPS ON M. A.; 


I. Q.; READING; 


AND INITIAL TEST SCORES 


Group Factor Mean 


Experimental __- M. A. 
Game. 2.5... I.Q 
Experimental___I. Q. 
Control Reading 
Experimental___ oi 
Initial 
Initial 
Initial 
Initial 
Initial 
Experimental.__W; Initial 
We Initial 
Initial 
Initial 
Initial 
Initial 
Initial 
Initial 
Initial 
Initial 
Initial 


Experimental__ Us 1 
Control 
Experimental _- Us 


Experimental___ Hs 3 


S. E. Difference 
Diff. of Means 


2.12 


3 
1.83 4. 
. 20 5 


‘A 
. 88 
. 00 
19 
. 79 


fe et et et bt Dt et et et 2D DD tt 





December, 1945] 


IMPROVABILITY OF TEACHERS IN SERVICE 


TABLE XI 


EQUIPMENT AND INSTRUCTIONAL SUPPLIES ADDED 
PERIOD (1938-1939) 


Physical 
ysical Equipment 


x—indicates equipment or supplies added. 


DURING THE EXPERIMENTAL 


TABLE XII 
COMPARED PERFORMANCE OF CONTROL AND EXPERIMENTAL GROUPS 


Measure Group 


U; Control 
Experimental 
U2 Control 
Experimental 
Wi Control 
Experimental 
We Control 
Experimental 
Ws Control 
Experimental 
Hy, Control 
Experimental 
He 


H3 
Experimental 


The difference in the mean change on W, 
(Abilities to Organize Research Materials) 
was 10.79, in favor of the experimental group. 
The critica] ratio was 3.91. 


The experimental group exceeded the con- 
trols in the mean change on U, (Safe Guard- 
ing Public Health) by 3.28. The critical ratio 
was 2.13. 


The mean change on H, (A Test of Civic 
Information) was 1.27, with a critical ratio 
of 2.61 in favor of the experimental group. 


A slight difference of .18 in the mean 
change on H, (A Test in Civic Attitudes) 


Diff. 


-Mean S&S. E. Diff. Means 


1.54 3.28 
12 . 50 
. 76 79 
. 72 . 53 
-51 .16 
.37 .18 - 48 
- 48 27 -61 
. 25 24 . 96 


favored the experimental group. The critical 
ratio was .482. 

On U, (Community Planning) the control 
group exceeded the experimentals in mean 
change by .s5o with a critical ratio of .o29. 

The controls exceeded the experimentals 
by 1.53 in mean change on W, (Scale of Civic 
Beliefs). The critical ratio was .890. 

The mean change on H, (A Test of Civic 
Action) favored the contro] group. The mean 
change was .24 with a critical ratio of .956. 

In general the statistically significant dif- 
ferences favored the experimental. In only 
four instances, however, were the differences 
significant. 





150 


PERFORMANCE OF SEVENTH GRADERS (1937- 
1938) WITH THE SAME PUPILS AS 
EIGHTH GRADERS (1938-1939) 


The seventh-grade pupil here studied were 
eighth graders in the same schools and under 
the same teachers during the school year of 
1938-1939. The educational objectives were 
the same both years and the same pupil tests 
were employed in measuring pupil change in 
both years. 

A comparison of the performance of these 
pupils during the school year of 1937-1938 
without supervision with their performance 
during the school year of 1938-1939 under 
supervision should give evidence of the effec- 
tiveness of the supervisory program and 
teacher growth. 

The means and standard deviations on the 
pupil measure for these pupils when eighth 
graders, during 1938-1939, are shown in 
Table XIII. The means and standard devia- 


TABLE XIII 


MEANS AND STANDARD DEVIATIONS FOR 
1938-1939 EIGHTH GRADE ON ALL 
MEASURES 


Standard 
Deviation 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


tions for these same pupils when seventh 
graders, during 1937-1938, are reported in 
Table VIII. 

Comparative data on the performance of 
the same pupils on the same pupil measures 
when seventh graders and when eighth grad. 
ers is compared in Table XIV. The mean 
change during the eighth-grade year (experi- 
mental year) was greater than during the 
seventh-grade year (control year) on all pupil 
measures except H,. 

In all cases except U, the mean initial test 
score in the fall of 1938 is greater than the 
final test score of the spring of 1938. It 
appears that some growth occurred during the 
intervening vacation period in seven of the 
eight measures. In most cases this growth was 
slight. A loss during this period might have 
been expected. The mean gain on W, during 
this period was greater than the mean gain 
made during the previous school year of 
1937-1938. 


EVALUATION OF PERFORMANCE IN READING 
AND Basic Stupy SKILLS 


It will be recalled that the Traxler Silent 
Reading Test (TR) and the Jowa Every- 
Pupil Test of Basic Skills: Test B, Vocabu- 
lary and Basic Study Skills (BS) were ad- 
ministered both as initial and final tests 
to both grades seven and eight during the 
experimental year (1938-1939). The means 
and standard deviations on the initial, final, 
and change scores for the seventh and eighth 
grade pupils on these two measures is re- 
ported in Tables [IX and XIII respectively. 

No measure of pupil change on these tests 
was available from the control year of this 
investigation since the basic skills test was 
not used at all and the reading test was given 
only in the fall. Therefore, the criterion for 
the evaluation of the supervisory program as 
it relates to these areas had to be found out- 
side of the control group. The norms reported 
by the authors of these tests were employed 
for this purpose. While this was less satisfac- 
tory, perhaps, than a control group, these 
data provide some assistance. 

Norms for the Traxler Silent Reading Test 
as reported by Traxler** were secured by 
testing near the beginning of the year, several 
hundred public-school pupils at the seventh, 


# Arthur E. Traxler, Teacher's Handbook for Traxler Silent 
Reading Test; Grades 7 to 10, Forms 1 and 2 (Bloomington, 
Illinois: Public School Publishing Company). 





December, 1945] 


IMPROVABILITY OF TEACHERS IN SERVICE 


TABLE XIV 


COMPARATIVE PERFORMANCE OF SAME PUPILS UNDER SAME TEACHERS Over A Two YEAR PERIOD 


Year 
1937-838 as 7th Graders 


1938-39 as 8th Graders 
1937-38 as 7th Graders 
1938-39 as 8th Graders 
.-1937-38 as 7th Graders 
1938-39 as 8th Graders 
.-1937-38 as 7th Graders 
1938-39 as 8th Graders 
1937-38 as 7th Graders 
1938-39 as 8th Graders 
1937-38 as 7th Graders 
1938-39 as 8th Graders 


1937-38 as 7th Graders Initia 


Final 
1938-39 as 8th Graders 


_.1937-38 as 7th Graders 


Change - 
1938-39 as 8th Graders 


eighth, ninth, and tenth grade levels and are 
reported in Table XV. 

Grade norms for the Jowa Test of Basic 
Study Skills were established by testing 
22,737 pupils in grades six, seven, and eight.** 
They are reported in Table XV. 

There seems to be no sound reason for 
assuming that the norms for rural school 


a ‘upil Tests of Basic Skills, (Iowa oe, Towa: Col- 
pi Pa State University of Iowa, 1938). 


Kind of Score 


i rstinmeceess 


EE ee 


Standard 


ida: Difference 
Deviation 


Mean Means 


70. 01 


. 36 
.99 
- 46 


a. 
17. 
12. 
14. 
13. 
16. 
14. 
12. 
13. 
12. 
9. 
12. 
12. 
10. 
2. 
3. 
2. 
3. 
2. 
2. 
3. 
3. 
3. 
3. 
3. 
3. 
2. 
2. 
2. 
3. 
2. 
3. 


pupils would be higher than urban pupils on 
either the Traxler Reading or Basic Study 
Skills Test employed in this investigation. 

The comparison of the changes made by 
the experimental groups and the changes the 
authors of the tests report as normal change 
are shown in Table XV. 

It will be noted that both grades seven and 
eight were grades 7-1 and 8-1 when given 





152 JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


TABLE XV 


COMPARISON OF EXPERIMENTAL GROUP CHANGES WITH NORMAL CHANGES ON TRAXLER READING 
AND Basic Stupy SKILLS 


Mean Raw Score 
Actual Mean Raw Score 


Initial 
66.9 
80.8 


29.0 
34.3 
46.9 
63.7 

1 Figures from Actual Grade Replacement. 


the initial test and grades 7-9 and 8-8 when 
given the final test. The grade gain for the 
control period was thus 7 months for the 7 
month testing interval. 

During the experimental period grade 
seven made a raw score change of 17.2 in 
reading, which, according to the norms, rep- 
resents an actual gain of 16 months, which is 
9 months in excess of that normally expected. 

Grade eight had a raw score change of 17.1 
representing an actual change of 12 months, 
which was 5 months in excess of the gain 
normal for the experimental period. 

On Part I (vocabulary) of the Basic Skills 
test the gain during the experimental period 
was, for both groups, the normally expected 
change. 

On Part II-VI (basic study skills) of the 
Basic Skills test grade seven made a raw 
score change of 15.4. This was equivalent to 
an actual gain of 14 months. This exceeds 
the normal change by 7 months. 

Grade eight on this same test had a raw 
score gain of 13.0. This represented an actual 
gain of 6 months in excess of the normal 
change. 

From an inspection of Table XV it appears 
that the performance of the experimental 
groups on reading and basic study skills was, 
in general, double that normally expected. 

The grade placement of the seventh grade 
on the initial reading test was 6-7. The grade 
placement of this group on the initial test of 
study skills was 6—o. The grade placement of 


Raw’ Score Change Nor- 


mal for Experimental 


Period 


Difference Between Normal 
and Actual Raw Score 
Difference Between Normal 
and Actual Grade Gains 


Change 
Mean Grade Placement 


Actual Grade Gain for 
Experimental Period 


Grade Gain Normal for 


Initial Final 
6-7 8-3 
8-1 9-38 


6-1 6-9 
7-1 
6-0 
7-4 


32 2 3 a F Experimental Period 


++E 


+ 
5 
nw 


oo 
— 


32 2 oe @ 
aes 


+ 
ore 
co w 

+ + + 
ounueoer- 


the eighth grade was 7—4 on the initial test 
of basic study skills. 


Since these groups started below their 
actual grade placement in these areas, it may 
be argued that the reported difference be- 
tween the normal and actual grade gain in 
months should be discounted. The norms re- 
ported by the authors of the Basic Skills Test 
show, however, a raw score gain of 1 point of 
grade gain per month. This is likewise true 
for the Traxler Reading Test up to grade 
8-o. We need not be concerned with the 
change beyond this point since grade eight 
has a grade placement on the initial reading 
test of 8-1 and were actually in grade 8-1. 
Furthermore the correlations reported in 
Table XVI, between initial test score on TR 
and the change on TR and between initial 
test score on BS and the change on BS are 
only —.o8 and —.18 respectively. This indi- 
cates no appreciable relationship in these 
data between the initial position and the 
change obtained on these tests. 


The correlations reported in Table XVII 
between reading change and the change on 
each of nine pupil measures and between 
change on basic study skills and change on 
each of eight pupil measures are generally 
insignificant. 

Conclusions concerning the effectiveness of 
the supervisory program in the improvement 
of reading skills and of certain other basic 


study skills are as follows: 


Cor 


Initial 
Measu 





December, 1945] 


IMPROVABILITY OF TEACHERS IN SERVICE 


153 


TABLE XVI 
CORRELATION OF INITIAL ScoRES WITH PUPIL CHANGE ScorES—1938-1939 EIGHTH GRADE 


Initial on 


Measures... —.40 —.14 —.65 —.54 


—. 48 


Ui V2 Wi We Ws Hi He Hs TR BS 
Change Change Change Change Change Change Change Change Change Change 


—48 —49 —.40 —.08 —.18 


TABLE XVII 


CORRELATION OF READING AND BASIC SKILL CHANGE ScorEs WITH PuPpiIL CHANGE ScORES— 
19388-1939 SEVENTH AND EIGHTH GRADES COMBINED 


Ui U Wi 


2 We 3 He H3 
Change Change Change Change Change Change Change Change 
.01 —.02 
—.02 


Reading Change..._ .02 


£ .05 .00 
Basic Skills Change. .01 


—.05 -10 


1. The pupils in grade seven gained six- 
teen months in silent reading rate and com- 
prehension during a seven month period. 

2. The pupils in grade eight gained twelve 
months in silent reading rate and comprehen- 
sion during a seven-months period. 

3. The pupils in grade seven gained four- 
teen months during a seven-months period in 
their ability to perform such basic skills as 
map reading, interpretation of graphs and 
charts, use of basic references, use of the 
index and use of the dictionary. 

4. The pupils in grade eight gained thirteen 
months during a seven-month period in their 
ability to perform these same basic study 
skills. 

5. This improvement in reading ability and 
basic study was not significantly related to 
changes made during the experimental period 
in the other areas of pupil growth and 
achievement as measured by the instruments 
employed in this investigation. 


CHANGE IN TEACHER QUALITIES 


It was pointed out in Section III that four 
general areas of teacher qualities in which 
Rostker found measures most closely related 
to teaching success as measured by the cri- 
terion of pupil change were: 


1. Information, as measured by American 
Council Civics and Government Test. 
Form B. 

2. Social attitudes, as measured by a Test 
of Social Attitudes of Secondary 
Teachers. 

3. Attitude toward teachers and the teach- 
ing profession, as measured by The 
Scale for Measuring Attitude Toward 
Teaching and the Teaching Profession. 


Ww Hi BS 


Change 
. 65 .09 


—.26 


- 46 
—.08 


—.01 
. 09 


.14 

4. Teacher-pupil relationship as measured 

by A Test of Teacher-Pupil Relation- 
ship. 


The scores made by the teachers on tests 
measuring these four areas are shown in 
Table XVIII. The scores are reported for 
both 1938 and 1939. The changes produced 
during the experimental] period are reported 
in Table XIX. 

The difference in the means for the Amer- 
ican Council Civics and Government Test for 
these teachers in 1938 and 1939 was 2.87 in 
favor of the experimental year. 

There was only a very small difference in 
the means for the Social Attitudes of Sec- 
ondary Teachers. This difference for Part I 
(social attitudes) was .54 with a critical ratio 
of .369. The difference for Part II (informa- 
tion and knowledge of recent current and 
national affairs) was 12.45 in favor of the 
1938 administration. The critical ratio was 
3.11. 

The difference of means on the Teacher- 
Pupil Relationship was 11.25 in favor of the 
experimental year. The critical ratio was 2.69. 

The change in the attitude toward teachers 
and the teaching profession was insignificant. 

The amount of suggested reading com- 
pleted during the experimental year by the 
participating teachers is shown in Table XX. 
It should be noted that Book No. 5, covering 
the areas measured by the Teacher-Pupil 
Relationship test was read by the largest per- 
centage of teachers and that the critical ratio 
for the test in this area approached signifi- 
cance. 

The type and amount of non-suggested 
reading done by the participating teachers 
during the experimental year is reported in 





JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 14, No.2 


TABLE XVIII 
ScorEs ON TEACHER MEASURES IN 1938 AND 1939* 


? Teachers Measures 
American Council Social Attitudes of Secondary Attitude Toward 
ivics and Teachers Teachers and Teacher-Pupil 
Teacher Government Part I Part II Teaching Relationship 
Number 1938 1939 1938 1939 1938 1939 1938 939 1938 1939 
153 142 101 265 314 
83 102 56 244 270 
101 103 67 ne FEED 
144 151 313 315 
99 114 252 286 
112 90 295 282 
74 86 291 293 
51 101 297 321 
117 278 290 
96 299 288 
109 283 
88 282 
90 288 
110 291 
96 303 
112 292 
95 278 


111 
201 
131 
104 
146 
116 106 


* Editor’s Note: These retest scores are interesting in that they throw some ~ upon the reliability 
of these measures, particularly under the conditions under which they were applied. The correlations be- 
tween the original test scores and those from the same tests administered a year later were as follows: (a) 
for the American Council Civics and Government Test, r=. 81; (b) for the Hartmann Social Attitudes of 
Secondary Teacher Part I (Social Attitudes), r=.11; (c) for the Hartmann Social Attitudes of Secondary 
School Teachers, Part II (Social Information), r=.28; (d) for the Torgerson Mental Hygiene Test, r=.6i; 
and (e) for the Yeager Attitude Toward Teachers and Teaching, r=.45. From Table XIX it can be seen 
that the changes for (b) and (e) could not have been due to any mass shift in a single direction; the low 
correlations, however, show considerable shift in position. The greatest mean shift was for tests (c) and 
(d). The correlation for (d) still remains reasonably constant but not so for (c). 


The above calculation was made by Lorraine Smith. 


wo 
oo 
_ 


Cornaurwhre 
22 20 ym G9 09 00 wm RO C9 69 60 C0 OO GO GO G0 GO RO G0 69.09 ym ym 


> 29 G0 G0 G0 Go G0 Go 0 NO Go G0 Go G9 Go G9 G9 69.69.69 C9.C He 
LOOM PEN OBEN OCOLONNN S&S 


eo 
% 


TABLE XIX 


MEANS, STANDARD ERROR OF DIFFERENCE, DIFFERENCE OF MEANS, AND CRITICAL RATIOS FOR 
1938 AND 1939 TEACHER MEASURES 
Difference 

Teacher Measure Mean S. D. M. Mean S.D.M. S.E. Diff. Means C. R. 
American Council Civ- 

ics and Government _ 110.39 29.75 113.25 29.25 3.78 2.87 . 16 
Social Attitudes of Sec- 

ondary Teachers 

P 10. 00 65. 63 9. 82 1.46 . 54 . 86 


7 Teachers 
Part : 16.98 37.63 5.72 3.99 
Test of Teacher-Pupil ; 

Relationship 7 21.38 291.00 24.68 4.25 
Attitude Toward 

Teachers and Teach- 


1.95 3. 56 2.23 3. 53 





December, 1945] 


Table XXI. The largest number of books 
read fell within the adult fiction classifica- 
tion, followed next by professional books 
dealing with methods of teaching reading and 
social studies. Professional magazines read 
were inclusively publications of teacher 
organizations. 
~“ TABLE XX 
AMOUNT OF SUGGESTED READING COMPLETED 


Number Percentage 
Reading Book Reading Book 


* For title and author of book see original thesis 
on file in the University Library, University of 
Wisconsin. 

Conclusions concerning the changes made 
during the experimental period in the quali- 
ties of the participating teachers measured by 
the instruments employed in this investigation 
are as follows: 

1. The improvement made in information 
relative to government and civics was 
insignificant. 

2. No significant differences were shown in 
social attitudes. 

3. There was a significant loss in the 
knowledge of recent current and national 
affairs. 

. The improvement in the area of teacher- 
pupil relationship approached signifi- 
cance. 


IMPROVABILITY OF TEACHERS IN SERVICE 


155 


5. No significant changes were made in 
interest in teaching as evidenced in atti- 
tudes toward teachers and the teaching 
profession. 


SECTION VI 
SUMMARY AND CONCLUSIONS 


In brief the conclusions concerning the 
effectiveness of the supervisory program as 
drawn from the four comparisons described 
in this investigation are as follows: 

1. The equivalent-group comparison indi- 
cated that the supervisory program was 
effective in areas measured by the following 
four tests of pupil growth and achievement: 
(a) Safeguarding Public Health, (b) Abil- 
ities to Organize Research Material, (c) 
Applying Generalizations to Social Studies 
Events, and (d) Civic Information. 

The areas covered by these tests included 
information, attitudes, and appreciations re- 
garding safeguarding public health; ability 
to organize research material; ability to apply 
certain generalizations to social studies 
events; and civic information. 

2. The equivalent-group comparison indi- 
cated that the supervisory program was not 
effective in areas measured by the following 
four tests of pupil growth and achievement: 
(a) Community Planning, (b) Scale of Civic 
Beliefs, (c) A Test of Civic Attitude, and 
(d) A Test of Civic Action. The areas cov- 
ered by these tests included information, atti- 
tudes, and appreciations regarding community 


TABLE XXI 
TYPE AND AMOUNT OF NON-SUGGESTED READING DONE 


Type of Reading 
Books 
Educational Philosophy 
Fiction (adult) 
Fiction (juvenile) 
Professional 
Educational Psychology 
Methods 


Reading 
Social Studies 


Magazines 
Professional 


* Of those reporting. 


Number of 
‘Teachers 
Reporting 


Average* 
Number 
Read 


Number 
Read 


27 
6 


ocoortr 1B OWS 


ow 





156 


planning; social and civic attitudes; and civic 
information. 

3. The single-group comparison showed 
that: 


a. In seven of the eight pupil measures the 
mean initial score for the pupils when in 
grade eight was equal to or greater than 
their mean final score when in grade 
seven and, in light of the data available, 
should normally produce small gain. 

- In all cases except the Test of Civic 
Action the mean change made during 
the eighth grade year was greater than 
the mean change made during the sev- 
enth grade year. This increased perform- 
ance may have been due to several 
factors among which is probably the 
experimental factor of supervision. 


4. The supervisory program was effective 
in improving silent reading ability and the 
ability to perform such basic study skills as 
map reading, interpretation of graphs and 
charts, use of the basic references, use of the 
index, and use of the dictionary. The super- 
vised group progressed approximately 100% 
farther in silent reading ability and approxi- 
mately 100% farther in basic study skills 
during the experimental period than was 
normal for that period. 

5. This improvement in reading ability and 
basic study skills was not significantly related 
to changes made during the experimental 
period in the-other areas of pupil growth and 
achievement as measured by the instruments 
employed in this investigation. 

6. The comparison of the performance of 
the participating teacher on the four teacher 
measures indicated that: 


a. In no area was there significant im- 
provement if one accepts a critical ratio 
of three or more for statistical signifi- 
cance. 

. There was a positive change in teacher- 
pupil relationship which approached 
significance. 

cf There were insignificant differences in 
information relating to government and 
civics; in social attitudes; and in inter- 
est in teaching. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


7. It would seem from an examination of 
the educational objectives set forth in this 
investigation and the pupil tests purporting 
to measure them, that the supervisory pro- 
gram was effective in producing pupil growth 
and achievement in some of the less tradi- 
tional] educational objectives. 


8. The supervisory program was most 
effective in those areas in which the program 
was most concentrated. 


g. In order to get maximum results, super- 
vision, it would seem, should therefore be 
centered upon a particular area which it is 
desired to improve. 


It should be pointed out that there were a 
number of factors inherent in the organiza- 
tion of this supervisory program which tended 
to operate against the effectiveness of the 
supervision. 

1. The investigator had no official relation- 
ship to any participating teacher. It is recog- 
nized that any exercise of dictatorial attitude 
on the part of a supervisor is detrimental to 
supervision, but it is also held that the exist- 
ence of some official relationship would en- 
hance teacher-pupil-supervisor relationship. 


2. Participating teachers during both the 
control and experimental years followed 
courses of study and-outlines prescribed by 
the county offices and were preparing their 
eighth grade pupils for the county eighth 
grade examinations. This preparation right- 
fully and obviously received first considera- 
tion. 


3. While an attempt was made to correlate 
the supervisory program as much as possible 
with the regular work of the participating 
schools and teachers, there was due to the 
nature of the objectives of the supervisory 
program an additional program involved. This 
at times for some of the teachers became a 
responsibility which they tended more or less 
to decline. 


4. The experimental year was the second 
year of participation for the teachers. It was 
obvious at times that this novelty factor oper- 
ated in favor of the control year. 





PERSONALITY AND TEACHING EFFICIENCY 


R. E. GotHam 
Beloit Public Schools 


SECTION I 
THE PROBLEM 


There seemed to be considerable reason for 
believing from general observation and the 
preceding studies herein reported, that per- 
sonality is an important. factor in teaching 
efficiency. The study here reported was under- 
taken to explore this ibility. An attempt 
was made to determine the personal qualities 
essential to teaching success as measured 
against four criteria of teaching success, 
namely, (1) teacher rating scales, (2) meas- 
ures of qualities commonly associated with 
teaching success, (3) changes produced in 
pupils, and (4) a composite of the foregoing. 

The personality of each participating 
teacher was measured through the use of a 
variety of personality rating scales; in addi- 
tion, each participating teacher filled out 
several personality inventories which pre- 
sumably reflected her particular personality 
pattern. The numerical indices obtained from 
these measures were studied in relation to the 
criteria of teaching success employed in the 
investigation. 

It was hoped that the results might furnish 
evidence in answer to the following questions: 


(1) What measurable relationship exists 
between a teacher’s ity as appraised 
through the use of rating scales and 
as measured by certain tests of personality, 
and her ability to produce measurable 
changes in her pupils? 

(2) What inter-relationships are there 
among different measures of personality? 

(3) To what extent can measurable pupil 
changes be predicted from a composite of 
personality. measures? 


SECTION II 
DESIGN OF THE EXPERIMENT 


This investigation, as outlined in the orig- 
inal plan for the study of teaching ability, 
was a part of Study Number Two. The de- 
sign of the experiment, the measuring instru- 


ments used and the procedures followed were 
the same as those already reported.’ 

In the first investigation** only teachers of 
the 7th, 8th, or combined 7th and 8th grades 
were used. This furnished evidence concern- 
ing the teaching ability of teachers in village 
and first class state graded schools (four-room 
schools), where the teacher taught one, or at 
the most two grades. The second study called 
for a sampling of teachers on one- and two- 
room rural schools. Teachers of this type of 
school comprise one of the largest groups of 
teachers in the state, and hence, merited con- 
sideration in such a study. 

The following factors were taken into con- 
sideration in determining the grade and sub- 
ject area selected: 


(1) Citizenship at the 7th and 8th grade 
level was being taught during the 1937-1938 
school year in all the rural schools. 

(2) Pupil change in the area of the social 
studies is considered of growing importance 
among educational objectives. 

(3) The objectives of Citizenship were 
broad enough to allow for considerable vari- 
ability in the techniques of teaching used by 
the teachers selected. 


(4) More desirable measuring instruments 
were available in this area and at this grade 
level than in certain other areas and grade 
levels. 

The following limitations were set up in 
the selection of participating schools: 


(1) That the school be a one- or two-room 
rural school, employing one and no more than 
two 

(2) That Citizenship be taught at the 7th 
and 8th grade level throughout the course of 
the school year. 

(3) That at least 5 pupils must be en- 
rolled in the combined 7th and 8th grades. 

(4) That the teacher must be willing to 
participate in the investigation. 

waJgk: Rolle, “The Messureinent of Teaching AbMicy Study 
(September, 1945), pp. S274. 


1a L. E. Rostker, “The Measurement of Teaching Ability: 
Study Number One,” op. cit., p. 11. 


157 





158 


Shortly after the opening of the school year 
in the fall of 1937 a large number of schools 
were visited and the proposed plan described 
to the teachers. The teachers in those schools 
meeting the above requirements were invited 
to participate. Sufficient teachers agreed to 
participate and a group of 72 schools located 
in Eastern Dane, Western Dane and Colum- 
bia Counties within a radius of 35 miles of 
Madison were selected.” 


SECTION III 


THE DEVELOPMENT OF THE CRI- 
TERION OF TEACHING ABILITY 


The major criterion of teaching ability 
employed in this investigation was that of the 
measurable changes produced in pupils. In 
order to develop this criterion it was neces- 
sary to obtain for each teacher a composite 
of pupil change score representing teaching 
ability. The method of doing this has already 
been discussed by Rolfe.** 


THE PROBLEM 


The specific purpose of this investigation 
as outlined in the introduction of this report 
is to determine the relationship, if any, be- 
tween certain measures of personality, em- 
ployed in this investigation, and the four 
criteria of teaching ability herein developed. 
The four criteria employed in this investiga- 
tion were: 

(1) Criterion of pupil change (major cri- 
terion )—Three batteries of tests were admin- 
istered to each pupil. Two of the batteries of 
tests were designed to measure long-term or 
school year changes in pupils; the third bat- 
tery was designed to measure unit or short 
term changes. The three batteries purported 
to measure information, interests, attitudes, 
beliefs and abilities developed in the Social 
Studies area. 

(2) Tests of qualities commonly associated 
with teaching efficiency —To obtain an index 
of each teacher’s professional information, in- 
terests, attitudes, understandings, personal 
qualifications, and abilities, a battery of thir- 
teen tests was administered to each teacher. 
The most valid and reliable available meas- 
ures of the qualities thought to be associated 
with teaching were employed. 


ae oon a ee et Oe ot Oe ee 
see J. F. Rolfe, op. cit 


& J. F. Rolfe, op. cit. ro. 58-68. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2. 


To obtain a composite score for each 
teacher on these qualities, the scores on each 
measure were converted to standard scores, 
The mean of the distribution for each 
measure was subtracted from the score and 
the remainder divided by the standard devia- 
tion of the distribution. The standard scores 
for the thirteen measures were then compos- 
ited. This numerical index represented the 
criterion of qualities commonly associated 
with teaching success. 


(3) A composite of teacher ratings—To 
obtain an index of each teacher’s teaching 
abilities, aptitudes, understandings, personal 
qualifications, and interests, as appraised by 
an observer of the teacher at work, five rating 
scales were employed for rating each teacher. 
Three raters, the superintendent, the super- 
visor, and the field research representative 
rated each teacher, on different visits, em- 
ploying the five rating scales. A composite 
score was obtained by converting each score 
into a standard score employing the same 
technique as used in obtaining the criterion 
of tests of qualities commonly associated with 
teaching success. 


(4) A composite of the foregoing—To 
obtain an index of each teacher’s profes- 
sional information, interests, attitudes, under- 
standing, personal qualification and abilities, 
together with the ratings of the superintend- 
ent, supervisor and field research worker of 
her teaching abilities, aptitudes, understand- 
ings, personal qualifications and interests as 
obtained through the use of three rating 
scales, a composite was made of all measures 
employed. A single index of general teaching 
ability was determined. 

In the analysis of these data an attempt 
will be made to answer the following ques- 
tions: 

(1) What relationship is there between 
each test of personality employed in this in- 
vestigation and the criterion of: 

(a) pupil change 

(b) tests of qualities commonly associated 

with teaching success 

(c) teacher rating scales 

(d) composite of tests and rating scales 

(2) What relationship is there between 
each personality rating scale employed in this 
investigation and the criterion of: 





December, 1945] 


(a) pupil change 
(b) tests of qualities commonly associated 
with teaching success 

(c) teacher rating scales 

(d) composite of tests and rating scales 

(3) What relationship is there between 

a composite of all tests of personality em- 
ployed in this investigation and the criterion 


“- pupil change 
(b) tests of qualities commonly associated 
with teaching success 
(4) What relationship is there between 
a composite of all personality rating scales 
employed in this investigation and the cri- 
terion of: 
(a) pupil change 
(b) tests of qualities commonly associated 
with teaching success 
(5) What relationship is there between 
a composite of all personality tests and rating 
scales employed in this investigation and the 
criterion of: 
(a) pupil change 
(b) tests of qualities commonly associated 
with teaching success 
(6) What inter-relationship is there be- 
tween all measures of personality employed 
and the criterion of pupil change? 


SECTION IV 


ANALYSIS AND INTERPRETATION 
OF DATA 


In the attempt to determine the relation- 
ship of the personality factor to teaching suc- 
cess it was first necessary to develop the 
major criterion of teaching ability, namely 
pupil change. Rolfe** describes fully the 
process of developing this single index of 
teaching ability as measured by pupil change. 
Lists of the criterion scores for each teacher 
are given in the original thesis on file at the 
University Library, University of Wisconsin. 

Three tests of personality were admin- 
istered to each teacher. These tests* were: 

(1) The Bernreuter Personality Inventory 

(2) The Washburne Social Adjustment 

Inventory 
(3) The Rudisill Scale for the Measure- 
ment of the Personality of Elementary 
— Teachers 
ta bata, et 
c | ae on file Pe 4&4 A ie 


PERSONALITY AND TEACHING EFFICIENCY 


159 


To determine the relationship of these 
measures of personality to the four criteria 
of teaching success coefficients of correlation 
were calculated between each measure and - 
the different criteria. 


CoRRELATIONS WITH THE CRITERION 
or Puprt CHANGE 


The correlations observed between these 
three personality tests and the major crite- 
rion of pupil change are given in Table I. 


TABLE I 


COEFFICIENTS OF CORRELATION BETWEEN THREE 
PERSONALITY TESTS AND THE CRITERION» 
oF PUPIL CHANGE 
Test 


Bernreuter Personality Inventory N 

With Criterion of Pupil Change 
Bernreuter Personality Inventory S 

With Criterion of Pupil Change 
Bernreuter Personality Inventory D 

With Criterion of Pupil Change 
Washburne Social Adjustment Inventory 

With Criterion of Pupil Change 
Rudisill—Scale for the Roommate of the 

——w of Elementary School Teach- 


Inspection of the correlations (Table I) 
between the three personality tests employed 
in this investigation and the criterion of pupil 
change reveals no significant relationship. 

The Rudisill Scale for the Measurement of 
the Personality of Elementary School Teach- 
ers purports to measure twenty personal traits 
assumed to be related to teaching efficiency. 
The score for each teacher on each of the 
twenty traits was correlated with the criterion 
of pupil change and is reported in Table II. 

No significant relationship is observed be- 
tween any of the twenty traits purportedly 
measured by the Rudisill Scale for the Meas- 
urement of the Personality of Elementary 
School Teachers and the criterion of pupil 
change, with the possible exception of pro- 
gressiveness (.30), refinement (.25), interest 
in work (—.35) and adaptability ‘—.23). 
Why the latter two are negatively correlated 
is not readily apparent. Rostker, Rolfe, and 
La Duke secured conflicting results for the 
Yeager scale for measuring interest in teach- 
ers and 

The criterion ‘of qualities commonly asso- 
ciated with teaching success—The criterion 
of qualities commonly associated with teach- 
ing success consists of a composite of thirteen 





160 


TABLE II 


COEFFICIENTS OF CORRELATION BETWEEN 
TWENTY TRAITS OF THE RUDISILL SCALE FOR 
THE MEASUREMENT OF THE PERSONALITY 
OF ELEMENTARY SCHOOL TEACHERS AND 
THE CRITERION OF PUPIL CHANGE 


Accuracy 
Adaptability 
Considerateness 


y 
Understanding of Children 
Refinement 


measures. The scores of each test were con- 
verted to a standard score before compositing. 
The thirteen measures were the following:* 


(1) The American Council on Education 
Psychological Examination 

(2) The Teachers College Psychological 
Examination 

(3) American Council Civics and Gov- 
ernment Test 

(4) Stanford Teaching Aptitude: T. R. 
(teaching and research ability) 

(5) Stanford Teaching Aptitude: T. A. 
(teaching and administrative ability) 

(6) Stanford Teaching Aptitude: A. R. 
(administrative and research ability) 

(7) Yeager—Scale for Measuring Atti- 
tude Toward Teachers and the 
Teaching Profession 

(8) Lewerenz—Steinmetz — Orientation 
Test 


(9) Torgerson— Test of Teacher-Pupil 
Relationship 
(10) Morris Trait Index L 
(11) Wrightstone—Scale of Civic Beliefs 
(12) Hartman—Test of Social Attitudes 
of Secondary School Teachers 
(12a) Part I, Information (Hartman) 
(12b) Part II, Attitude (Hartman) 


*A full description of each test will be found in L. E. 
, Op. cit., pp. 15-19. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


The correlations observed between the 
three personality tests and the criterion of 
qualities commonly associated with teaching 
efficiency are given in Table III. 

Inspection of Table III reveals no signifi- 
cant relationship between any of the three 
tests of personality and the criterion of 
teacher tests, except for the Washburne Social 
Adjustment Inventory where a correlation of 
47 is observed. This result is in general 
agreement with that found elsewhere. 


TABLE III 


COEFFICIENTS OF CORRELATION BETWEEN THE 
THREE TESTS OF PERSONALITY AND THE 
CRITERION OF TESTS OF QUALITIES 
COMMONLY ASSOCIATED WITH 
TEACHING EFFICIENCY 


Test 


Bernreuter Personality Inventory N with the 
criterion of professional teacher tests 
Bernreuter Personality Inventory S with the 
criterion of professional teacher tests______ 
Bernreuter Personality Inventory D with the 
criterion of professional teacher tests____ _- 
Washburne ial Adjustment Inventory 


with the criterion of professional teacher 
tes 


Personality of Elementary School Teach- 
ers with the criterion of professional teach- 
er tests__ eu awawiknes : 


CRITERION OF TEACHER RATINGS 


The criterion of teacher ratings consisted 
of the composite of ratings from three teacher 
rating scales as appraised by three observers, 
namely, the superintendent, supervisor, and 
the field research worker. The teacher rating 
scales used were the following:*° 


(1) Michigan Teacher Rating Scale 

(2) Torgerson Diagnostic Teacher Rating 
Scale of Instructional Activities 

(3) Almy-Sorenson Rating Scale for 
Teachers 


’ The correlation coefficients observed be- 
tween the three personality tests employed in 
this investigation and the composite of three 
teacher ratings are given in Table IV. 


From Table IV it may be observed that 
the Washburne Social Adjustment Inventory 
is the only personality test significantly re- 
lated to the criterion of teacher rating scales. 
The coefficient of correlation is .40. 


5A full description of each teacher rating scale will be 
found in L. E. Rostker, op. cit., pp. 19-20. 





December, 1945] 


TABLE IV 


COEFFICIENTS OF CORRELATION BETWEEN THE 
THREE TESTS OF PERSONALITY AND A 
COMPOSITE OF TEACHER RATINGS 


Test r 
Bernreuter Personality Inventory N with the 
criterion of Teacher Rating .16 
Bernreuter Personality Inventory S with the 
criterion of Teacher ratings .01 
Bernreuter Personality Inventory D with the 
criterion of Teacher Rating .19 
Washburne Social Adjustment Inventory 
with the criterion of teacher ratings . 40 
’ Rudisil! Scale for the Measurement of the 
Personality of Elementary School Teach- 
ers with the criterion of teacher ratings.. .16 


Since the Almy—Sorenson Rating Scale for 
Teachers contains a noticeable proportion of 
items generally thought to be measures of 
personal qualities, this scale was deleted from 
the composite of the teacher rating scales and 
the correlation coefficient determined using 
the two remaining scales. 

The coefficient of correlations observed be- 
tween the three personality tests employed in 
this investigation and a composite of the two 
remaining rating scales is given in Table V. 


TABLE V 


COEFFICIENTS OF CORRELATION BETWEEN THREE 
TESTS OF PERSONALITY AND A COMPOSITE 
or Two RATING SCALES 


Tests r 

Bernreuter Personality Inventory N with the 

criterion of teacher rating - Ay 
Bernreuter Personality Inventory S with the 

criterion of teacher rating. .04 
Bernreuter Personality Inventory D with the 

criterion of teacher rating - .21 
Washburne Social Adjustment Inventory 

with the criterion of teacher ratings __ - .40 
Rudisill Scale for the Measurement of the 

Personality of Elementary School Teach- 

ers with the criterion of teacher ratings. .—.17 


It will be noted from observing Table V 
that the only significant correlation is found 
between the Washburne Adjustment Inven- 
tory and the composite of the two rating 
scales. 

A further comparison was made between 
the personality tests and a composite of three 
personality rating scales. These personality 
scales were:® 

(rt) An unstandardized four-point Person- 

ality Rating Scale 
' ©A full description of each rating scale will be found in 


original thesis on file in University Library, University of 
Wisconsin. 


PERSONALITY AND TEACHING EFFICIENCY 


161 


(2) An unstandardized thirty-three point 
Scale for Evaluating the Fitness of 
Teachers 

(3) The Almy-Sorenson Rating Scale for 
Teachers 


The correlations observed between the 
three personality tests employed in this in- 
vestigation and a composite of the three 
personality scales employed are given in 
Table VI. 


TABLE VI 


COEFFICIENTS OF CORRELATION BETWEEN THREE 
OF PERSONALITY AND A COMPOSITE 
OF THREE PERSONALITY SCALES 
Test r 
Bernreuter Personality Inventory N with a 
com posite of three personality rating scales .06 
Bernreuter Personality Inventory S with a 
composite of three personality rating scales—. 01 
Bernreuter Personality Inventory D with a 
composite of three personality rating scales .09 
Washburne Social Adjustment Inventory 
with a composite of three personality rat- 
ing scales 47 
Rudisill Scale for the Measurement of the 
Personality of Elementary School Teach- 
ers with a composite of three ee 
rating scales 


From Table VI it is observed that no sig- 
nificant relationship is evident between the 
tests of personality employed in this investi- 
gation and a composite of three unstandard- 
ized personality rating, scales, except the 
Washburne Social Adjustment Inventory 
with a coefficient of correlation of .47. 

Due to the fact that the Almy—Sorenson 
Rating Scale for Teachers contained items 
not generally included in personality rating 
seales, a second composite consisting of the 
two remaining personality rating scales was 
developed. The correlations obtained between 
the three personality tests and the composite 
of the two personality rating scales are re- 
ported in Table VII. 

An inspection of Table VII reveals but one 
significant coefficient, namely, that between 
the Washburne Social Adjustment Inventory 
and a composite of two personality rating 
scales, which was .52. 

Having observed the relationship of the 
three personality tests to the various criteria 
of teaching success employed in this investi- 
gation the relationship of the unstandardized 
Personality Rating Scales made use of in this 
study to the various criteria was observed. 





JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE VII 


COEFFICIENTS OF CORRELATION BETWEEN THREE 
OF PERSONALITY AND A COMPOSITE 
oF Two PERSONALITY SCALES 
Test 
Bernreuter Personality Inventory N with a 
composite of two personality rating scales _ 
Bernreuter Personality Inventory S with a 
composite of two personality rating scales 
Bernreuter Personality Inventory D with a 
composite of two personality rating scales 
Washburne Social Adjustment Inventory 
— a composite of two personality rating 


Personality of Elementary School Teachers 
with a composite of two personality rating . 
. —.1 


The unstandardized Personality Rating 
Scales employed in this study were: 


(1) An unstandardized four-point Person- 
ality Rating Scale 

(2) An unstandardized thirty-three point 
Trait Scale for Evaluating the Per- 
sonal Fitness of Teachers 


The correlations between the personality 
rating scales and the major criterion of pupil 
change are listed in Table VIII. 


TABLE VIII 


COEFFICIENTS OF CORRELATION BETWEEN TWO 
PERSONALITY RATING SCALES AND THE 
CRITERION OF PUPIL CHANGE 
Test 
Unstandardized four-point Personality Rat- 
ing Scale with the criterion of Pupil change 

Unstandardized thirty-three point scale for 
Evaluating the Personal Fitness of Teach- 
ers with the criterion of pupil change 


The coefficients of correlation between the 
unstandardized Personality Rating Scale and 
the criterion of pupil change (Table VIII) 
are interesting both because of their largeness 
and smallness. They are larger than those 
found by Rostker but yet not large enough 
to indicate a considerable agreement between 
the criterion of pupil change and supervisory 
ratings. (Also see Table XI.) 


The relationship of the Unstandardized 
Personality Rating Scale with the criterion of 
qualities commonly associated with teaching 
success was studied. The correlations are re- 
ported in Table IX. 


[Vol. 14, No.2 


TABLE IX 


COEFFICIENTS OF CORRELATION BETWEEN Two 
PERSONALITY RATING SCALES AND THE 
CRITERION OF TEACHER TESTS 

Measure 


Unstandardized four-point Personality Rat- 
ing Scale with criterion of qualities com- 
monly associated with reaching success____ 
Unstandardized thirty-three point Scale for 
Evaluating the Personal Fitness of Teach- 
ers with the criterion of qualities com- 
monly associated with teaching success__._  . 14 


No significant relationship is noted between 
the unstandardized Personality Rating Scales 
and the criterion of qualities commonly asso- 
ciated with teaching success. 


The coefficients of correlation obtained be- 
tween the unstandardized rating scales em- 
ployed in this study and the criterion of 
teacher ratings are reported in Table X. 


TABLE X 


COEFFICIENTS OF CORRELATION BETWEEN Two 
UNSTANDARDIZED PERSONALITY RATING 
SCALES AND THE CRITERION OF 
TEACHER RATING SCALES 

Measures 
Unstandardized four-point Personality Rat- 
ing Scale with criterion of teacher ratings _. 
Unstandardized thirty-three point Scale for 
Measuring the Personal Fitness of Teach- 
ers with criterion of Teacher Ratings___- . 87 


The high correlations between the unstand- 
ardized personality Rating Scales and the 
criterion of teacher ratings are doubtless due 
to the simijarity of the areas measured and 
“halo effect” in the use of these scales. 


CoMPOSITES OF VARIOUS CRITERIA 


To explore further the relationship of the 
various measures of personality employed in 
this investigation to the criterion of pupil 
change, the several] teacher tests and teachers 
rating scales were composited and related to 
the several criteria. The relation of these 
composites to the criteria of pupil change are 
reported in Table XI. 

No significant relationship is observed be- 
tween the criterion of pupil change and the 
various composites of personality measures, 
except possibly for the unstandardized per- 
sonality and personal fitness rating scales 
employed in this study. 


Pers 
tes 
an 


a ro 
32842 


BSS 





December, 1945] 


TABLE XI 


COEFFICIENTS OF CORRELATION BETWEEN 
SEVERAL COMPOSITES OF PERSONALITY 
MEASURES AND PUPIL CHANGE 


Composite of 


Tests of Qualities commonly associated with 
a Success with criterion of pupil 
chan 

Personality Tests and Rating Scales with 
criterion of pupil change 

Personality Tests with criterion of pupil 


change 

Usstaniardiaed Personality and Personal 
Fitness Rating Scales with criterion of 
pupil change 


The relationship between the various tom- 
posites of the measures of personalj ty and 
criterion of qualities commonly iated 
with teaching success was also studied. The 
coefficients of correlation observed are re- 
ported in Table XII. 


TABLE XII 


COEFFICIENTS OF CORRELATION BETWEEN 
VARIOUS COMPOSITE MEASURES 
OF PERSONALITY 


Composite of 
Tests with criterion of teacher 
qualities commonly associated 


Personalit 
tests 0 


with conden 

Unstandardized Personality Rating Scales 
with criterion of teacher tests of qualities 
commonly associated with teaching suc- 


cess 

Personality Tests and Unstandardized Per- 
sonality Rating Scales with criterion of 
teacher tests of qualities commonly assoc- 
iated with teaching success 


Two of the composites in Table XII show 
significant correlations. 


The coefficients of correlation between a 
composite of the measures of personality and 
a composite of teacher rating scales are re- 
ported in Table XIII. 


TABLE XIII 


COEFFICIENTS OF CORRELATION BETWEEN 
Various COMPOSITES OF TESTS AND 
PERSONALITY SCALES 


Measure 
Personality Tests 
“with a composite of teacher ratings 
Unstandardized Personality Rating Scales 
with a composite of teachers ratings_____ 
Personality. Tests and the Unstandardized © 
persona ity Ratings combined with a com- 
posite of teachers ratings 


PERSONALITY AND TEACHING EFFICIENCY 


163 


The two high correlations observed be- 
tween items two and three in Table XIII are 
doubtless due to overlapping items and the 
“halo effect”. 

In order to determine the inter-relationship 
between the various measures of personality 
employed in this investigation including the 
twenty traits included in the Rudisill Scale 
for the Measurement of the Personality of 
Elementary School Teachers, intercorrelations 
between all measures employed were calcu- 
lated. These intercorrelations are given in 
Table A,, of the original thesis. 


No significant relationship is observed 
among the various indices except those meas- 
uring similar traits or areas. These larger 
coefficients of correlation are possibly due to 
the “halo effect” in the ratings of teachers 
by supervisory officials. . 

The intercorrelations between the criterion 
of pupil change and the various composites 
of personality tests and rating scales are 
given in Table XIV. Some of these are mod- 
erately significant. 

The multiple R for these composites with 
the criterion of pupil change was .40. The 
composites seem not to improve the predictive 
efficiency of the data over that secured from 
the zero order correlations for a composite of 
teacher rating scales. The failure of the mul- 
tiple R to improve the prediction probably 
arises, in part from the high intercorrelations 
between the ratings and, in part, from the 
small amounts contributed by measures other 
than ratings. The composites were as follows: 


(1) Composite of Teacher Rating Scales 

(2) Composite of Personality Tests and 
Rating Scales 

(3) Composite of Michigan and Torgerson 
Rating Scales 

(4) Composite of Personality Tests 

(5) Composite of Measures of Qualities 

Commonly Associated with Teaching 

Success 

Composite of Personality Rating 

Scales 

Composite of Personal Fitness and 

Personality Rating Scales 


(6) 
(7) 


The regression equation for these several 
composites in standard score form was as 
follows: 

Z,==—.16Z, — .13Z, 
+ .23Z, + 332; — .74Z, 
+ .41Z, + .292Z, 





164 JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 14, No.2 


Decem 
TABLE XIV 


INTERCORRELATIONS BETWEEN THE VARIOUS COMPOSITES OF TEACHER MEASURES AND THE 
ON OF PUPIL CHANGE 


(2) (3) (4) (5) (6) (7) 
Composite Composite Composite Composite Composite C 
Teagher” ‘Mi foneity far Tease sane 
Ratings Torgerson Ratings & Pers. Scales Tests 


of the 
this ar 
cation 
(8) 
investi 
among 


. & peas qeseends Cine 
an 

6) Composite personality tests 

‘ oon - 

(7) Composite teacher tests 

(g) Composite personality tests _ 


SECTION V 
SUMMARY AND CONCLUSIONS 


The results of this investigation seem to 
furnish evidence in support of the following 
conclusions: 


(1) No significant relationship is observed 
between the criterion of pupil change and the 
three personality inventories employed 
namely: 


(a) The Bernreuter Personality Inventory 
N—Neurotic Tendency (—.14) 
S—Self Sufficiency (—.11) 
D—Dominance-Submission (.04) 


(b) The Washburne Social Adjustment 
Inventory (.06) 


(c) The Rudisill Scale for the Measure- 
ment of the Personality of Elementary 
School Teachers (.03) 


(2) No significant relationship is observed 
between the criterion of pupil change and the 
various different personal traits measured in- 
dividually by the Rudisill Scale for the 
Measurement of the Personality of Elemen- 
tary School Teachers, with the possible ex- 
ception of interest in work (—.35), progres- 
siveness (.30), refinement (.25) and adapt- 
ability (—.23). 


(3) A moderately significant relationship 
(namely .47) is observed between the Wash- 
burne Social Adjustment Inventory and the 
criterion of teacher tests. No significant rela- 
tionship was observed with respect to the two 
remaining tests. 


(4) A moderate relationship is also ob- 
served (namely .40) between the Washburne 
Social Adjustment Inventory and the crite- 
rion of teacher ratings obtained from a com- 
posite of the following rating scales: 


(a) Torgerson Diagnostic Teacher Rating 
Scale 


(b) Michigan Teacher Rating Scale 


(c) Almy—Sorenson Rating Scale for 
Teachers 


No significant relationship is observed be- 
tween the remaining two personality inven- 
tories and the criterion of teacher ratings. 


(5) Some relationship is observed between 
the criterion of pupil change and the follow- 
ing teacher rating scales: 


(a) Almy-Sorenson Rating Scale for 
Teachers (.36) 

(b) Michigan Teacher Rating Scale (.39) 

(c) Torgerson Diagnostic Teacher Rating 
Scale (.43) 

(d) An unstandardized four-point Person- 
ality Rating Scale (.30) 

(e) An unstandardized thirty-three traits 


Scale for Evaluating the Personal Fit- 
ness of Teachers (.35) 


(6) No significant relationship was ob- 
served between the tests of personal qualities 
and the personality rating (.13) 


(7) A multiple correlation of .40 was 
found between a composite of teacher per- 
sonality measures and pupil change. Because 


ciency. 
follows 


(a) 


(b) 





December, 1945) 


of the preliminary character of the work in 
this area no advantage is gained from appli- 
cation of the multiple correlation technique. 

(8) One of the significant findings of this 

investigation is the lack of agreement found 
among the several criteria of teaching effi- 
ciency. The correlation for these were as 
follows: 

(a) Between pupil change and tests of 
qualities commonly associated with 
teaching success (.13) 

(b) Between pupil change and personality 
tests and scales (.27) 


PERSONALITY AND TEACHING EFFICIENCY 


165 


(c) Between the measures of personality 
and qualities commonly associated 
with teaching efficiency (.32) 

(d) Between a composite of the Michigan 
and Torgerson Rating Scales and 
Pupil change (.40) 


While the latter coefficients of correlation 
are large enough to be statistically significant 
the criterion of pupil change apparently 
measures something different from that 
measured by teacher ratings and tests of 
qualities commonly associated with teaching 
efficiency. 





A FACTOR ANALYSIS OF TEACHER ABILITIES 


A. G. HELLFRITZSCH 
Chevy Chase, Maryland 


SECTION I 


It seemed clear to the writer from a survey 
of previous studies of teacher abilities that 
there was much overlapping in the qualities 
and abilities measured by the instruments 
commonly employed for this purpose and 
that the work in this field might be greatly 
facilitated by some sort of factor analysis. 
The problem of this study is to determine 
the number and kinds of factors common to 
some twenty-five measures of teacher abilities 
applied to two groups of teachers, one group 
teaching social studies in the eighth grade of 
Wisconsin rural state graded schools, and an- 
other group teaching the same subject in one- 
room rural schools. More specifically the 
study will attempt to answer the following 
questions: 1. How many common factors are 
there in a complex of measures frequently 
used in investigating the nature of teaching 
ability? 2. What are these factors? 3. Which 
factors are measured by the various tests? 
4. Which factors are related to pupil growth? 
And 5. which factors are related to super- 
visory ratings of teachers? 


SECTION II 
THE PLAN: FACTOR ANALYSIS 


INTRODUCTION 


The method of analysis to be used in this 
study had its beginning in the work of Spear- 
man? and his associates who sought to explain 
the generally positive correlations between 
various mental tests in terms of a general 

Tats. Two Factors,” Psychological 
no A Biissies of of Man (New York: Macmillan 


“ 


1C. 
Review, (1914), 


c. 
Co., 1927). 


factor of intelligence which was common to 
all the tests used. As larger and more varied 
batteries of tests were analyzed, it was found 
necessary to postulate the existence of more 
than one factor in order to account for the 
intercorrelation coefficients between the vari- 
ous tests. This led to the development of 
multi-factor methods of interpreting the cor- 
relation table for a battery of tests. 


SoME CONCEPTS IN Factor ANALYSIS 


To facilitate discussion of the common 
factor problem and to illustrate very briefly 
to the uninitiated some of the terms used in 
this field, consider the fictitious data given in 
Table I. Six tests correlated with each other 
yield the correlation matrix R there given. 

To factor this matrix means to find a com- 
mon factor matrix, more specifically referred 
to as a factorial matrix, F , which can satisfac- 
torily reproduce the correlation matrix R. 
One factorial matrix F which reproduces the 
entries in R perfectly when postmultiplied by 
its transpose F’ is 

i| 


A Cc 


wmowecoc 


The transpose, F’, is written merely inter- 
changing the rows and columns of F, thus 


-4 
.6 .0 , 
.0 .9 .0 


This factorial matrix F and its transpose will 
reproduce the correlation matrix R by the 
following equation 

FF’ =—R 


.0 .0 
.0 7 


0 l 
.0 


5 || 


| .0 .8 
| 


TABLE I 


CORRELATION MATRIX 


(. 64) 


. 32 
- 00 


3 


. 30 
. 32 
(. 52) 
.00 
42 
. 28 





December, 1945) 


To verify this fact in the above example we 
need only to multiply F on the right by F’ 
according to the rules of matrix multiplica- 
tion. In matrix multiplication the element in 
the ith row and jth column of R is equal to 
the sum of the pair products of the elements 
in the ith row of F by the elements in the j th 
column of F’. Thus 


Another wa ng this is that the cor- 
relation coefficient between two tests is the 
inner product between the corresponding rows 
of F. 

The number of common factors (columns) 
necessary in F to reproduce R is the rank of 
R. Since the elements of R are infested with 
sampling errors, the rank of R in the mathe- 
matical sense is generally equal to the order 
of R, i.e. the number of tests in the battery. 
In factor analysis the statistical rank of R is 
taken to be the least number of columns re- 
quired in F to reproduce R == FF’ such that 
the differences between the elements in FF’ 
and the observed correlations in R are small 
enough to be reasonably attributed to the 
sampling errors in the observed correlation 
coefficients. Rank as subsequently used in 
this discussion will refer to statistical rank as 
here described rather than rank as defined in 
matrix algebra. The differences between the 
observed elements of R and those obtained 
in the product FF’ are called residuals. 

The columns of F correspond to the com- 
mon factors and the rows of F to the tests. 
The three common factors in F are labeled 
A, B, and C. Each of the 18 cells in F con- 
tains a number showing the factor loading of 
the test in that row on the common factor in 
that column. A factor loading is equal to the 
correlation coefficient between the test and 
the common factor. The factor loading 
squared equals that fraction of the test vari- 
ance which is accounted for by the common 
factor. The sum of squares of the factor load- 
ings for a test equals the communality of that 
test or the fraction of the total variance 
(unity) of the test that is accounted for by 
all the common factors. Thus for test 6, the 
sum of squares is .74, indicating that 74% 
of the total variance of test 6 is accounted 
for by the common factors. 

To rotate F (6 rows x 3 columns) means 
to multiply it on the right by a square (3x3) 


FACTOR ANALYSIS OF ABILITIES 


167 


matrix G, called the transformation matrix. 
The resultant matrix product V (6x3) where 
V = FG is a rotated common factor matrix. 
The transformation matrix G may be either 
orthogonal or oblique. 


If G is orthogonal, then the common factor 
axes in V remain mutually at right angles 
with each other; i.e. the correlation between 
any two common factors is zero. With G 
orthogonal, the sum of squares of a row of V 
still equals the test communality, and the 
inner product between any two rows of V 
reproduces the corresponding correlation 
coefficient in R. 


If G is oblique, the common factors in V 
are no longer mutually independent of each 
other and it becomes necessary to obtain the 
product 

G’G=H 


in order to see the manner in which the com- 
mon factors of V are interrelated. The side 
entries of H show the correlation coefficients 
of the oblique axes of V with each other. 
These are not zero in the oblique case as they 
are in the orthogonal case. 


By a general common factor is meant one 
upon which each test in the battery has a 
non-zero loading; i.e., a column in F-or V 
which had no zeros would correspond to a 
general factor. A group factor is one that is 
common to a sub-group of two or more tests 
of the battery; e.g., in F all three factors are 
group factors. A specific factor is one that is 
found in only one test of the battery and 
hence is not a part of the common factor 
matrix. The complexity of a test means the 
number of common factors upon which it has 
significantly large loadings. Thus, in F, tests 
1, 3, and 6 are of complexity 2 whereas tests 
2, 4, and 5 are of complexity 1. 

In factor analysis the total test variance is 
assumed to be made up of the communality 
(A*), specificity (6*), and error variance (c’). 
These are related thus: #? + 5? + c= 1. 

The basic assumption underlying the factor 
methods is stated by Thurstone*, “The per- 
formance of an individual on a test is deter- 
mined in part by the abilities that are called 
for by the test and in part by the degree to 
which the individual possesses these abilities.” 
The manner in which the individual’s test 
score is determined by the factors called for 


*L. L. Thurstone, The Vectors Mind (Chicago: The 
University of Chicago Press, 1935), A, ¢ 





168 


by the test and the degree to which the indi- 
vidual possesses the common factor abilities 
is assumed to be 


S31 = Gyr Bis Mya Mai H +--+ $F yq Tas 


where 


S;; == the standard score of the ith indi- 
viduals on the jth test. 

x’s represent standard scores of individuals 
in q statistically independent arbitrary 
reference abilities. 

a’s represent factor loadings in the tests. 


The x’s describe the individuals, the a’s de- 
scribe the tests. The elements of F are the 
common factor a’s of a battery of tests. 

The function for S;, assumed in factor 
analysis may not be the manner in which test 
scores are actually dependent upon individ- 
uals’ abilities and tests, but it serves as a 
useful first approximation in the early stages 
of finding landmarks in the field of human 
abilities. Once such landmarks are found, 
defined, subjected to more direct measure- 
ment, and rather generally accepted by scien- 
tists in the field, the more exact nature of 
the function S;, will be revealed. 


Some Factor METHODS 


Chief among the multi-factor methods of 
analysis are the Bi-factor analysis of Hol- 
zinger,® the principal axes analysis of Hotel- 
ling,* and the multiple factor analysis of 
Thurstone.® 


The bi-factor analysis postulates the exist- 
ence of a general factor common to all] the 
tests in the battery plus group factors which 
are common to sub-groups of the tests. Each 
test has loadings on two factors, the general 
and one group factor. 

The principal axes analysis makes no 
assumptions regarding the number of com- 
mon or group factors but locates the first 
factor in such a manner that the sum of 
squares of projections of all the tests on the 
first factor is maximized. The’ second factor 
is orthogonal to the first and located such 
that the sum of squares of the projections of 
the residuals of the tests is maximized and so 
on, er all the common factors have been 


*K. J. Holzi pee, ont 2. Memmen, Bese Ancipets 

(Chicago: Th University of Chicago Press, 1941). 

4H. Hotelling, ‘Anal of a Complex of Statistical Vari- 

ables into "x ¢ ts,” Journal of Educational 

Psychology, ee 1933),417-441; (October 
498-529. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


extracted. In this solution each test generally 
has loadings on all of the common factors. 


The multiple-factor analysis of Thurstone 
first finds a centroid common factor matrix 
which satisfactorily accounts for the intercor- 
relations between the tests and then rotates 
this common factor matrix such that the 
number of zeros in it is maximized. In this 
rotation the mutual orthogonality of the com- 
mon factor axes can either be preserved or 
dispensed with, according as the investigator 
wishes to preserve the mutual independence 
of the common factors or obtain more zeros 
at the expense of having the common factors 
mutually interdependent. No assumptions are 
made concerning the existence of general or 
group factors. In the Thurstone solution a 
general factor may or may not appear. A 
given test may be dependent upon one or 
several common factors. 


THE SELECTION OF A METHOD FOR 
Tuts Stupy 


Which of the methods should be used in 
the resolution of a multiplicity of teacher 
abilities into common factors? The answer to 
this question lies partially in the answer to: 
Why do a factor analysis in the first place? 


This factor analysis was undertaken be- 
cause it was believed that twenty tests of 
teacher abilities were not measures of twenty 
different things but that all twenty could be 
comprehended in terms of a much smaller 
number of concepts. Thurstone® generalizes 
this belief thus: “It is the faith of all science 
that an unlimited number of phenomena can 
be comprehended in terms of a limited num- 
ber of concepts or ideal constructs.” From a 
practical standpoint, volumes of research in- 
volving hundreds of teacher tests are unlikely 
to be of much use in the selection, training, 
and hiring of teachers unless this research is 
boiled down to the few significant common 
factors that can be assumed to operate in this 
field of human behavior. 


The several factor methods are all in agree- 
ment with respect to this tenet. They, there- 
fore, all have the problem of determining the 
least number of common factors that will 
satisfactorily account for the observed inter- 
correlations. 


Given the least number of common factors 


required by a correlation table, which method 
* Thurstone, op. cit., p. 44. 





oon s+ we @ 


ay erernvrw ea wy 


areTwre 6STaerelUCUchlUh.hCUm 


ee aS Ee Tl memlmtCSE itéiarw!]|UmDr 


December, 1945) 


should be used in factoring it? There are an 
infinite number of common factor matrices 
for a given rank of the correlation matrix 
which will account for the observed correla- 
tions equally well. Among these are the bi- 
factor solution, the principal axes solution, 
and the Thurstone solution with either ortho- 
gonal or oblique axes. Which of these should 
be used here? The writer believes in the selec- 
tion of that solution which accounts for the 
intercorrelations in the simplest manner; i.e., 
the solution which is most parsimonious. A 
common factor matrix of 20 tests involving 
5 common factors involves altogether 100 
cells which may contain either zeros or posi- 
tive or negative numbers. If many of these 
entries are zero, it is much easier for the mind 
to retain the factor pattern which describes 
the tests. The writer, therefore, has insisted 
upon a solution which will have a maximum 
number of zero entries in the common factor 
matrix consistent with other important con- 
siderations. This choice excludes the principal 
axes solution, which generally has relatively 
few zeros in its resultant matrix, as a desir- 
able solution here. 


The teacher traits which will be analyzed 
in this study include mental ability, certain 
personality traits, supervisory ratings, as well 
as ability to teach as measured by pupil 
growth. Previous studies reveal many near- 
zero correlations between various pairs of 
these traits. It does not seem reasonable, 
therefore, to postulate a general factor com- 
mon to all of the tests in analyzing this prob- 
lem. This consideration eliminates the bi- 
factor solution as a desirable solution here. 


There remains then only the multi-factor 
solution of Thurstone which makes no 
assumptions concerning either the extent of 
group factors or the complexity of the tests. 
A Thurstone solution can, however, be made 
with either orthogonal or oblique axes. A 
solution permitting oblique axes may yield a 
few more zeros in the rotated common factor 
matrix but carries with it non-zero entries in 
the non-diagonal cells of a matrix showing 
the correlations of the common factors with 
each other. Thus, if there are six oblique 
common factors, there will be fifteen non- 
zero numbers describing the manner in which 
the common factors are interrelated. In an 
oblique solution these fifteen members must 
be constantly kept in mind when thinking 


FACTOR ANALYSIS OF ABILITIES 169 


about the common factor pattern of the tests. 
From the point of view of parsimony, there- 
fore, it would seem desirable to insist that 
these fifteen non-diagonal entries should all 
be zero. Parsimony, in this instance, is more 
than a total of zeros in the factorial matrix. 
If the common factors are interrelated, vary- 
ing one of them in an individual not only 
affects the performance of the individual on 
the tests which have loadings on that factor 
but also varies all other common factors 
which in turn affect the performance of the 
individual on all of the tests. This manner of 
solution would seem to render the thought 
processes in dealing with the common factor 
solution of a battery of tests almost as com- 
plex as dealing with the original set of zero- 
order correlations. 


In the common factor solutions of two sets 
of teacher abilities to be reported in the next 
two sections, the writer made no a Priori 
assumptions concerning: (1) the existence or 
non-existence of a general factor, (2) the 
complexity of any test, or (3) whether the 
tests shall be contained in the positive mani- 
fold or not. The assumptions which were 
made, however, are: (1) the validity, as a 
first approximation, of the basic equation for 
S;; given above, (2) the best solution should 
be the most parsimonious solution; i.e., have 
the greatest number of near-zeros in the 
rotated common factor matrix, and (3) the 
common factors shall remain orthogonal 
under rotation. An objective index of rota- 
tional fit consistent with maximizing the num- 
ber of zeros was used in determining which 
rotations shall be made, eliminating much of 
the subjectivity which pervades present 
methods of rotation. 

The following two studies therefore have 
attempted to arrive at unbiased but parsimo- 
nious common factor solutions of two tables 
of intercorrelation of teacher abilities. 


Implications and findings concerning the 
nature and number of common factors in each 
battery of teacher abilities will be considered 
separately for each study at the close of the 
corresponding section. The manner in which 
the findings from the two studies agree and 
differ will be considered in the fifth section 
dealing with the summary and conclusions. 





170 


SECTION III 


ANALYSIS OF STUDY NUMBER ONE: 
THE 1936-37 WISCONSIN STUDY 
OF TEACHING ABILITY 


The general plan of this study, the teach- 
ers, and the tests employed have been 
described by Rostker.** The teachers were 
twenty-four seventh- and eighth-grade teach- 
ers in non-departmentalized rural state graded 
schools in south central Wisconsin. Both 
teachers and pupils were given a number 
of tests. The pupils were given a battery 
of social studies tests and tests of reading 
and intelligence. The teachers were given 
tests of intelligence, knowledge of subject 
matter, personality, social attitudes, teach- 
ing attitudes and a number of other aspects 
of teacher abilities. The criterion of teaching 
ability was that of pupil gain. A list of the 
teacher measures with the key number for 
each is given below: 


(1) ACPE, American Council on Educa- 
tion Psychological Examination for 
College Freshmen, 1936 Edition. 

(2a) TCPE (1934), The Teachers College 
Psychological Examination, 1934 
Edition. 

(3) ACCGT, American Council Civics 
and Government Test, Form B. 

(4) MTIL, Morris Trait Index L. 

(5) WSAI, Washburne Social Adjustment 
Inventory, Sapich Edition. 

(6a) TDRS(E), Torgerson Diagnostic 
Teacher Rating Scale of Instructional 
Activities. The ratings assigned by 
the investigators (E) were used here. 
ASRS(E), Almy-—Sorenson Rating 
Scale for Teachers. Investigators 
ratings. 

MTRS(E), Michigan Teacher Rating 
Scale. Investigators ratings. 

(9) OTFAE, Lewerenz—Steinmetz—Orien- 
tation Test Concerning Fundamental 
Aims of Education, 1935 Revision. 
ATTP, Yeager—Scale for Measuring 
Attitude Toward Teachers and the 
Teaching Profession. 

PGTA (36-37), Pupil Gain Index 
of Teaching Ability in Social Studies 
at the Eighth Grade Level, as estab- 
lished by Rostker (C,). 


«L, z tag ey . — a of Selig, [Oy: 
‘ournal of Experimental y 
iiguaie, 1945), pp. 7-22. 


(7a) 


(8a) 


(10) 


(11a) 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


(12) AORM, Wrightstone — Abilities to 
Organize Research Material. 

(13) BPI(F,.), Bernreuter Personality In- 
ventory: Self-Confidence. 

(14) BPI(F,), Bernreuter Personality In- 
ventory: Sociability. 

(15) SEAT(T-A), Stanford Educational 
Aptitudes Test: Teaching versus Ad- 
ministrative Ability. 

(16) SEAT(A-R), Stanford Educational 
Aptitudes Test: Administrative 
versus Research Ability. 

(17) SASST (Tot.), Hartmann—S oc ia] 
Attitudes of Secondary School 
Teachers: Parts I and III. 

(18) TPMH, Torgerson—Theory and 
Practice of Mental Hygiene. 

(19) TPPT, Torgerson—Teaching Prob- 
lems and Teaching Processes Test. 


THE CORRELATION TABLE 


The nineteen teacher measures listed 
above were correlated with each other in the 
following manner: (1) measures which were 
not already in standard score form were so 
converted, all standard scores being carried 
to two decimal places; (2) the sums of cross- 
products were computed on a Monroe Calcu- 
lator; and (3) the correlation coefficients 
were obtained by substitution in the formula 


24 


I 
y= W S 2ui2x; 


k==1 


The resultant table of intercorrelations is 
reported by Rostker.* It should be re- 
called that low scores on variables (10)— 
ATTP, (13)—BPI(F.), and (14)—BPI(F,) 
are desirable from the point of view of meas- 
uring the trait defined by the test. In the 
case of (15)—SEAT(T-A) and (16)— 
SEAT (A-R), high or low scores indicate pre- 
dominance of either the first or second traits 
in the difference respectively. In all other 
tests, high scores indicate the desirable ex- 
treme of the trait continuum. 

In readying a correlation table for centroid 
factorization, it is desirable to maximize the 
number of positive correlation coefficients. 
Doing this enables the first centroid factor to 
extract a greater portion of the common 
factor variance from the battery of tests. 
The number of negative correlations can be 
reduced if the intercorrelations of any test 

* Rostker, /bid., pp. 41-43. 





December, 1945] 


with all other tests shows a majority of nega- 
tive coefficients. Reduction is achieved by re- 
flecting such test; i.e., reversing the signs of 
all correlations in the row and column cor- 
responding to the particular test. Although 
this reverses the meaning of the high and low 
extremes of the scores, the factor solution can 
be carried out with the test reversed, after 
which one needs only to reverse again all 


FACTOR ANALYSIS OF ABILITIES 


171 


signs corresponding to the test in the factorial 
matrix to restore the test scores to their orig- 
inal direction. 

Inspection of the correlation table revealed 
that the number of negative entries could be 
considerably reduced by reflecting (13)— 
BPI(F.) and (15)—SEAT(T-A). This was 
done, The resultant table of correlations is 
presented in Table II. In this correlation 


TABLE II 


THE CORRELATION MATRIX R,: INTERCORRELATION TABLE OF 19 TEACHER ABILITIES 
BASED ON 24 TEACHERS 


(Study A) 


(2a) 
(1) _  -—'P . 795 . 489 
(2a) TCPE (1934) _ - : . 795 ¢ 7% . 503 
(12) AORM paso . 503 - 
(3) er aE ’ . 575 . 648 
(13) : , . 038 ~.074 
(14) _ ; . 163 . 207 
(4) MTIL cane . 242 . 895 
(5) eda 7 .319 -.017 
(6a) , . 437 . 443 
(Ta) . 267 335 
(8a) . 329 . 249 
(15) .015 .217 
(16) . 084 
(17) . 592 
(18) . 468 


No. Test Code No. (1) (12) (13) 


. 196 


(14)* 


(9) seat } 497 


(10) 

(19) : anal 
(lla) PGTA (36-37) - 

* Low Scores desirable. 

** High Scores signify administrative end of continuum. 


. 388 
—. 080 
. 492 


TABLE II]—Continued 


No. (4) (5) (6a) 
ACPE_. .472 . 254 . 636 
TCPE (1934) __- . 242 . 319 . 437 
RE . 395 . 443 
: . 484 j .495 

.217 , . 327 

.472 ’ . 065 

) ’ . 457 

: . 224 

. 457 : ) 

. 416 : . 874 

.419 : . 857 

. 369 : .119 

. 083 : . 041 

. 805 . 457 

. 530 . 804 

. 547 
. 022 
. 390 
. 188 


Test Code 


(11a) 141 


* Low scores desirable. 
** High scores signify administrative end of continuum. 





JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 14, No.2 


TABLE II—Continued 
(9) 


No. Test Code No. (16)** (17) 


. 565 
- 416 
. 592 
. 552 
.214 
. 046 
. 305 
. 184 
. 457 
. 341 
.413 


(18) 

. 877 
. 343 
. 468 
. 520 
.214 
- 160 
. 530 
. 120 
. 304 
.314 
. 268 
. 043 
. 358 
- 409 
- 409 vee 
. 499 . 516 
. 327 


(10)* 
- 806 
.121 
. 388 
. 318 
—.024 
. 189 
—. 062 
—. 142 
. 022 
—.031 
. 028 
. 216 
. 084 
. 327 
—.075 
. 140 


(19) 
.114 
. 190 


(lla) 
. 580 
. 866 
. 492 
. 372 
. 285 
. 088 
. 260 
.141 
. 188 
. 148 
. 155 
. 182 


ACPE —. 063 
TCPE(1934)... —. 086 
AORM . 084 


TDRS (E)----- 
ASRS (E) 
MTRS (E)- -- - 
SEAT (T-A)** ‘ —. 025 
SEAT (A-R)**_ (_) —. 226 
SASST (Tot.).. —. 226 Ro 
(a) Zeeee....... 8 
(9) 
(10) 


(19) 
(lla) 


. 053 


TPPT . 082 
PGTA (36-37). =. 112 . 501 


* Low scores desirable. 


—.075 ; (3 
- 261 ‘ 
- 460 


- 445 


** High scores signify administrative end of continuum. 


matrix R,, 68 of the coefficients are negative. 
Since the matrix is symmetric, this means 
that 34 of the 171 side entries on one side of 
the diagonal are negative. The frequency dis- 
tribution of these 171 different intercorrela- 
tions is shown in Table ITI. 


TABLE III 


FREQUENCY DISTRIBUTION OF THE INTER- 
CORRELATIONS OF THE NINETEEN 
TEACHER MEASURES 


Class Interval 
From 


ra 


PNONRADWO OP 


The algebraic mean of the 171 correlations 
in R, obtained by addition of the individual 
elements is .229, and the average of their 
absolute values is .287. 


Tue Factor ANALYSIS 


The centroid factorization—tThe factorial 
analysis of matrix R, consists of two steps: 
(1) finding a factorial matrix F, such that 
F,F,’ will reproduce R, within limits small 
enough to be attributable to sampling errors, 
and (2) rotating F, by an orthogonal trans- 


formation G, into a factorial matrix V, that 
contains a maximum number of near-zero 
entries. 


The method of factorization used was the 
centroid method described by Thurstone.’ 
Since the diagonal elements should have 
entries equal to the communalities of the tests 
and since these remain unknown until after 
the correlation matrix has been factored, they 
must initially be approximated. The approxi- 
mation used here is that recommended by 
Thurstone;* i.e. placing in the diagonal cell 
the highest absolute value of the correlations 
of the test with any other test. The reflection 
of tests (13) amd (15) described above led 
to positive B’s for all columns of the correla- 
tion matrix, where B is the algebraic sum of 
all non-diagonal entries in a column. 


The centroid method of factorization con- 
sists briefly of (1) finding r, which equals 
the algebraic sum of all the coefficients in the 
table, including the approximated diagonals; 
(2) calculating 1/\/r,; (3) obtaining the 
algebraic sum of each column of R,, includ- 
ing the approximated diagonal entry; and 
(4) obtaining the factor loadings of each test 
by multiplying the column sums by 1///7%, 
due account being taken of the algebraic sign 
of the loading if the test in that column has 
been reflected prior to the extraction of the 
given factor. 


7 Thurstone, op. cit., pp. 232-250. 
8 Ibid., p. 89. 





December, 1945] 


The first centroid factor extracted from R, 
is listed in column I of Table VI. Using these 
first factor loadings a,,, the table of first 
factor residual coefficients r, ;, was calculated 
where fo.jx == "jx — @:@:. Since the centroid 
of any table of residuals so derived is coinci- 
dent with the origin, it is necessary to reflect 
some of the tests in order to remove the cen- 
troid from the origin before the next centroid 
factor can be extracted. In doing this, it is 
desirable to remove the centroid from the 
origin as far as possible so that the factor will 
remove as much of the residual variance as 
possible. The method of reflection or sign 
changing used was a variation of the methods 
described by Thurstone.® Instead of consid- 
ering only the number of negative entries in 
a column to determine which test shall be 
reflected first, second, etc., special notice was 
taken of the signs of the residuals whose abso- 
lute values exceeded some arbitrary value 
(ranging from .30 for the first factor residuals 
to .og for the fifth factor residuals). Let X 
equal the total number of negative non- 
diagonal elements in a column of residuals, 
Y the number of negative elements whose 
absolute values exceed the arbitrary value, 
and Z the number of positive elements whose 
absolute values exceed the arbitrary value. 
The index used to determine which test to 
reflect, first, second, etc., then was W, where 


W=-X + 2(¥Y—Z) 


The test for which this index was largest at 
each stage of the sign changing was the one 
selected for reflection. Reflection was con- 
tinued until as many as possible of the large 
residuals were positive. The application of 
the resultant sign changes to the residuals in 
all cases led to positive B’s which is what was 
wanted. This method appears to be less labo- 
rious than calculating new B’s after each re- 
flection and more certain of arriving at posi- 
tive B’s than considering only the number of 
negative signs regardless of the size of the 
residuals to which the signs are attached. 
Thus, in the reflection of all residual tables, 
the sign changing was continued until all B’s 
were positive. Table IV shows which tests 
were reflected in each table of residuals. 


The diagonal elements were approximated 
at each stage by inserting the highest absolute 
value of the non-diagonal elements in each 
column. 

® Thurstone, op. cit., p. 107 


FACTOR ANALYSIS OF ABILITIES 


173 


The question of when to stop extracting 
centroid factors is identical with the question 
concerning how many common factors are 
necessary to account for the elements of R, 
within the limits of sampling errors in the 
original r’s. After reviewing the literature on 
this point, see especially Thurstone,’ 
Mosier,** Coombs,?* and McNemar,** the 
writer is inclined to agree with McNemar 
that (1) an acceptable index of when to stop 
factoring must be a function of the size of the 
sample upon which the original correlations 
and their sampling errors are based, and (2) 
it is invalid to compare the standard devia- 
tion of the residuals with the standard error 
of the original correlations since this tacitly 
assumes that the sampling errors of the orig- 
inal correlations are uncorrelated. Since all 
the intercorrelations are based upon the same 
sample, their sampling errors will be corre- 
lated. McNemar** develops a criterion of 
when to stop factoring by considering the 
similarity between the centroid residuals and 
partial correlation coefficients. He derives a 
formula for adjusting the standard deviation 
of the factorial residuals so that the adjusted 
value approximates the standard deviation of 
partial correlations. The adjusted standard 
deviation is given by 

___ Tresiduals 
— 1—M »? 
where 

C;residuals == Standard deviation of 171 re- 

siduals 
M,,? = mean communality of the 19 
tests 


The residuals can then reasonably be attrib- 
uted to sampling errors when o; becomes 
less than or equal to the standard error of a 


zero partial correlation coefficient; i.e., less 
than or equal to 1/\/N. Since in this study 
all correlations are based upon a sample of 
N = 24, the critical value of 1/VN is .204. 


% Thurstone, of. cit., pp. 65-70. 


. , “Influence of Chance Error on Simple 
Structure: An Empirical Investigation of the Effect of Chance 
timated Communalities on Simple Structure in 
Factorial Analysis,” Psychometrika, IV (March, 1939), pp. 
33-44. 
2C. H. Coombs, “A Criterion for Significant Common 
+ > emecadl Psychometrika, VI (August, 1941), pp. 
267- 
%Q. McNemar, “On the Number of Factors,” Psycho- 
a, VII (1942), pp. 9-18. 





JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


TABLE IV 
SIGN-CHANGING TABLE FOR CENTROID FACTORIZATION OF R, 


Reflection* of Residuals after Extracting Factor 


V 
X 


X 


vata 


* The x’s indicate the tests that were reflected in the factor residual tables. 


Table V lists the standard deviation of the 
residuals, the mean communality of the tests, 
and the value of o; after the extraction of 
each of the six centroid factors. 


TABLE V 
McNEMAR INDEX OF WHEN TO STOP FACTORING 
(Study Number One) 


S. D. of Mean 
residuals Communal- v1 
ity 
. 309 
. 449 
. 543 
. 613 
. 654 
. 698 


After 


.179 
. 130 
. 100 
.077 
. 051 
. 042 


. 259 
. 236 
. 219 
. 199 
. 185 
. 169 


It is seen that o; becomes less than the 
critical value of .204 after the extraction of 
the fourth factor, indicating that four com- 
mon factors are sufficient for reproducing the 
correlation matrix within discrepancies that 
can be attributed to sampling errors. Although 
only four factors are indicated by the above 
consideration, factors five and six were kept 
for inclusion in the process of rotation with 
the thought that at least one of these dimen- 
sions might collapse into a residual error 
plane in the process of rotation. This proce- 
dure is in accord with the practice of Thur- 
stone** who points out that it is by all means 

4L. L. Thurstone, Primary Mat Fae oo 


Monographs, No. 1 (Chicago: of Chicago 
Press, 1938). . 


preferable to rotate a structure in more 
dimensions than are required to describe it 
most simply than to run the risk of trying to 
analyze the problem in fewer dimensions than 
are actually necessary. 

The resultant centroid factorial matrix F, 
is reproduced in Table VI. The first column 
of Table VI designates the test to which each 
row refers; the next six columns of F, show 
the factor loadings of the tests on the six 
centroid factors. In the seventh column is 
listed the sum of squares of the factor load- 
ings for each test; i.e., the communality, 7’. 
The last column lists the test reliabilities as 
reported by Rostker. The bottom row shows 
the sum of squares of the loadings in each 
column, which is the amount of the total test 
variance accounted for by each factor. ‘Since 
the total test variance is 19 for the battery 
of tests, it is only necessary to divide the sum 
of the entries in the last row by 19 to deter- 
mine how much of the total test variance is 
accounted for by the six common factors. 
Doing this indicates that 69.8% of the total 
test variance is accounted for by the six 
common factors. 

The reliability of a test should equal or 
exceed its communality since test reliability 
is an additive function of both the commu- 
nality and specificity of the test. Inspection 
of the last two columns of Table VI bears 
this out in all instances except two in which 
the reliability is .o2 less than the communal- 





December, 1945) 


FACTOR ANALYSIS OF ABILITIES 


175 


TABLE VI 


Tue CENTROID MATRIX F.: Factor LOADINGS OF 19 TEACHER ABILITIES ON SIX ARBITRARY, 
ORTHOGONAL, CENTROID COMMON FACTORS 


Centroid Factor 
IV Vv 


II Ill 
—. 165 . 878 
—. 152 . 440 

. 365 . 246 

. 808 - 200 
—. 331 —. 050 

. 575 —. 428 

. 159 —. 295 
—. 607 . 291 
—. 365 —. 224 
—. 393 —. 368 
—. 493 —. 239 

. 532 —. 290 

. 309 —. 396 

. 079 . 328 

. 199 —. 134 

. 155 —. 064 

-417 413 
—.617 —.261 

- 147 . 882 


2.67 1.79 
* Noreliability coefficients available. 


ity. It must be remembered, however, that the 
reliability coefficients were not all based on 
this group of 24 teachers alone. In the case of 
tests (1), (6a), (7a), (8a), and (10), the 
communality and reliability are practically 
alike, indicating that the specificities of these 
tests in this battery are practically zero. All 
tests except (13)—BPI(F.) have at least 
50% of their test variance accounted for by 
the six common factors. The communalities 
range from .44 to .94, the mean communality 
being .70. On the average then, 70% of the 
test variance is accounted for by these com- 
mon factors. Another way of saying this is 
that on the average only 30% of the test 
variance is attributable to sampling errors 
and test specificity combined. 


The six centroid factors reproduce all but 
5 of the 171 intercorrelations within one 
probable error of the values given in R,. 

The factorial matrix F, then can be 
accepted as one which reproduces the correla- 
tion matrix R, by the product F,F,’ well 
within the limits of sampling errors of the 
original correlations. The six centroid refer- 
ence factors are necessarily mutually ortho- 
gonal because of the manner in which fac- 
torization was done.** 

Rotation of the orthogonal reference frame. 
—It was pointed out in an earlier section that 

% L. L. Thurstone, Vectors of Mind, p. 97. 


. 190 
-111 
. 053 
. 204 
—. 391 
. 270 
. 122 
—. 824 
. 239 
. 197 
- 262 
- 479 
—. 230 
—.076 
—.470 
—. 203 
. 160 
—. 2651 
—. 308 


VI 


—. 087 
. 169 


. 226 


. 134 


there are an infinite number of factorial 
matrices that reproduce the correlation matrix 
equally well. Matrix F, obtained in the previ- 
ous section is one matrix that reproduces R, 
well within the sampling errors of the correla- 
tion coefficients. With appropriate orthogonal 
transformations of F,, however, an infinite 
number of factorial matrices could be gener- 
ated, each of which would reproduce R, as 
well as does F,. The problem here is to trans- 
form F, such that the resultant factorial 
matrix has a maximum number of near zero 
entries, subject to the condition that the 
transformation shall be orthogonal; i.e., main- 
tain the mutual independence of the common 
factors. 

This problem can be expressed in terms of 
its geometrical counterpart thus, (1) each 
row in F, lists the coordinates of a point in 
orthogonal six-space, (2) the tests in F, can 
be represented as vectors connecting the 
origin with these points, (3) a column in F, 
shows the projections of the 19 test vectors 
on one of the six reference axes, (4) the 
square root of the test communality, 4,, rep- 
resents the length of the test vector in the 
common factor six-space, and (5) the in- 
tercorrelation between two tests as listed in 
R, is the inner product between the two cor- 
responding test vectors; i.e., 

Tix ee hh, cos 6 





176 JOURNAL OF EXPERIMENTAL EDUCATION 


where 6@ is the angle between the two vectors. 
The lengths of the vectors and the angles be- 
tween them must remain invariant under any 
transformation since these relationships deter- 
mine the elements of R, which any legitimate 
factorial matrix must continue to reproduce. 
Thus, F, can be interpreted as representing 
the coordinates of 19 points with respect to 
six orthogonal axes. The configuration of 19 
points must remain invariant, but the six 
orthogonal axes can be rotated with respect 
to this configuration. Each rotation of the 
axes changes the projections of the 19 test 
vectors but leaves the test configuration un- 
changed. The problem of maximizing the 
number of near zeros in the factorial matrix 
then becomes the problem of rotating the six 
axes such that as many of the vectors as pos- 
sible have near zero projections on the axes. 
Another way of saying this is that the axes 
are to be rotated such that the points fit the 
axes best. 


Before considering the process of rotation, 
consider an index of goodness of fit** that can 
be used to determine which of several rota- 
tions provides the best fit of the points to the 
axes. The usual criterion of best fit in fitting 
curves to points is to determine the curve 
parameters such that the sum of squares of 
deviations of the points from the curve is 
minimized. This, however, cannot be applied 
in this problem because the sum of squares 
of the deviations (projections) of a point 
from the axes corresponds to the square of 
the vector length and must remain invariant. 

The best conceivable fit of the points to 
the axes and the one that would lead to the 
maximum number of zeros in the factorial 
matrix would be the case in which each test 
vector is colinear with one of the reference 
axes. In this ideal case, each row of the fac- 
torial matrix would have only one non-zero 
entry, and this entry would be exactly equal 
to the length of the test vector. The sum of 
the absolute values of the loadings in the fac- 
torial matrix would then be equal to the sum 
of the lengths of the test vectors. Call this 
total length #,. As the axes are rotated away 
from this perfect fit, the test vectors will have 


™% The basis of the index to be developed here was suggested 
by Thurstone but was s devel: _o (op. cit. .. Common 
practice in reporting studies is not to report 
any index of the goodness of ae. cet con factor 
It seems to the writer that the more widespread use 

index in reporting studies would furnish a vole 
able basis for comparing several common factor solutions and 
tend to bring the several schools of thought in this field 
closer together. 


[Vol. 14, No.2 


projections on more than one axis each. The 
sum of the squares of the projections will not 
change as this is done, but the sum of the 
absolute values of the projections of all of 
the vectors on all of the axes will increase. 
This follows from the theorem: 


If g,° + a, + 4,7 +....+ 4° =—F 
then |a,| + |@,| + |¢s| + .--.+ |an| Sh 


This suggests that the proximity of the sum 
of the absolute values of the elements of ‘the 
factorial matrix to the limiting value 4, can 
be used as an indication of the goodness of 
fit of the points to the axes. The index that 
will be used here is the ratio of the sum of 
the absolute values of the factor loadings to 
the limiting value 4, ; i.e., 


Index of structure fit — ISF — 
n r 
j= t kot 1@yx| 
h. 





where 
@;, == projection of the jth test on the kth 
axis 
n == number of tests 
r == number of axes 
hy. — length of all the test vectors; 


3 J S60 


jor k==I 
An index of fit can also be determined for 
each test separately, thus: 
r 


> |@yx| 
k=1 


Index of test fit — ITF — ; 
j 


where A; = the length of test vector j; i.e., 


and a; and r have the same meanings as 
above. 

The minimal value of ISF or ITF is unity 
and can be obtained only if the test vectors 
are colinear with the reference axes. The 


Deceml 


smaller 
larger | 
jmum 
corresf 
vector 
the axt 
or ITI 
Vr; 
factors 
The 
troid 1 


be eq 
equals 


32.758 
of the 
this ir 
The «¢ 
reduc 





December, 1945] 


smaller the index the better the fit, and the 
larger the index the poorer the fit. The max- 
jmum value the index could assume would 
correspond to the’ case in which each test 
vector would make equal angles with all of 
the axes. This maximum value for either ISF 
or ITF can easily be shown to be equal to 
yr, where r is the number of common 
factors. 

The index of structure fit ISF for the cen- 
troid matrix F, was calculated and found to 

32.758 
be equal to i 2.073 where 15.804 
equals the sum of all the vector lengths and 
32.758 equals the sum of the absolute values 
of the entries in F,. The maximum value for 
this index in six-space is equal to \/6 = 2.45. 
The object of the rotation of F, will be to 
reduce the value of ISF by suitable rotations 
to its minimal value. 

The writer spent considerable time and 
effort in attempting to arrive at a transfor- 
mation of F, that would minimize ISF by 
analytic means, but these efforts up to the 
present time have been unsuccessful.** The 
problem is to find an orthogonal transforma- 
tion matrix G, which, when multiplied on the 
right of F, will produce a rotated factorial 
matrix V, for which ISF will be a minimum; 
i.e., 

F.G,=V, 


where F, is the given (mzxr) centroid matrix 
G, is the desired (rxr) orthogonal 

transformation matrix 
and_V, is the (xr) rotated factorial matrix 
for which ISF is to be a minimum 


Since analytic means for the solution of 
this problem are not available, the solution 
was carried out by graphical means described 
by Thurstone** and extended in some respects 
by the writer. The graphical procedure as 
described by Thurstone for the case of three 
common factors consists briefly of, (1) ex- 
tending the vector lengths to unity, (2) 
plotting the points in the so augmented fac- 
torial matrix on the surface of a sphere, (3) 
locating new axes on the surface of the sphere 
graphically such that the number of near-zero 
projections on the three axes is maximized, 

™ The writer wishes to 


hemati 
University of Wisconsin, especially W. C. 
suggestions and collaboration in connection with 
tional problem. 


BL. L. Thurstone, Vectors of Mind, pp. 164-170. 
L. L. Thurstone, Primary Mental Abilities, pp. 73-78. 


FACTOR ANALYSIS OF ABILITIES 


177 


(4) writing the transformation matrix G by 
entering for its elements the direction cosines 
of the new axes with respect to the old, and 
(5) applying the resultant G to F. In the case 
of more than three common factors, Thur- 
stone, (1) plots the m points in all possible 
two-dimensional sections, (2) selects those 
partial rotations from these two-dimensional 
plots which will improve the fit most, (3) 
applies this partial rotation to F, (4) replots 
in two-dimensional sections the partially 
rotated F, (5) repeats steps 2, 3, and 4 until, 
(6) no further improvement of fit is indi- 
cated by the two-dimensional sections. The 
single transformation matrix G that will 
rotate F into the final rotated form V is the 
product of the partial rotations; i.e., 


GuGGS....G 


Thurstone writes,’® “The graphica] method of 
rotating in one plane at a time is still prob- 
ably the best single method.” Other methods 
of rotation described by Thurstone”® and 
Tucker** do not readily permit the investi- 
gator to preserve the orthogonality of the 
reference axes. 


The present problem is concerned with 
rotating a six-dimensional structure F,. The 
writer extended the usual graphical methods 
by plotting three-dimensional sections of six- 
dimensional F, on the surface of a blackboard 
sphere eighteen inches in diameter for the 
first two partial rotations. In order to do this, 
the fraction of the vector lengths in the three 
dimensions chosen to be plotted together had 
to be stretched to unit length so that the ter- 
minal point of the partial vector extended to 
the surface of the sphere. After having 
plotted the 19 points on the sphere, a suit- 
able rotation of all three axes could be found 
by moving about on the sphere a spherical 
right triangle, which was constructed of wire, 
and noting the position of the right triangle 
for which the number of near zero projections 
in this three-dimensional subspace was max- 
imized. The vertices of the spherical right 
triangle correspond to the mutually ortho- 
gonal rotated axes in this subspace. The ele- 
ments of the partial rotation G,; could then 

%#L. L. Thurstone, Primary Mental Abilities, pp. 72-73. 

*L. L. Thurstone, “The Bounding Hyperplanes of a Con- 

* Psychometrika, 1 (1936), pp. 61-68. 

. “A New Rotational M in Factor 

, III (December, 1938), pp. 199-218. 

. R. Tucker, “A Semi-Amalytical Method of Factorial 
Rotation to Simple Structure,” Psychometrika, IX ( 

1944), pp. 43-68. 





178 JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 14, No.2 


TABLE VII 


THE PARTIAL ROTATION TRANSFORMATION 
MATRICES Gai 


-906 .193 .375 
. 422—. 503—. 755 
.085 .842—. 538 

0 0 0 

0 0 0 

0 0 0 


.966 0 

0 . 543 

0 0 

0 —.685 
0 —.487 

—.259 0 


coocoor 
om l 


— 
~j 
os 
o-~ 


© ccooore 
2 r 
’o COCO mrH COOOKHOS 


Oo 
y 


noo 
oO 
coooro cooroo ooorsce cooroo ie 


©2o ‘ coooeo oo 
a ~— 
Oo nr 


nw 


wo 
oo 


4 
So 


Giay= 


0 
0 
1 
0 
0 
0 
0 
0 
1 
0 
0 
0 
0 
0 
9 
0 
0 
0 
0 
0 
1 
0 
0 
0 
0 











oooroo oooor°o opoe 


o 


be written as the direction cosines of the 


0 
0 


0 
. 816 
. 568—. 543—. 617 


.103 .793—. 600 


wonoeo orococeo 


ve) 
(oC) 


P~ 
o 


woooco & 
wo 
ao 


0 
0 


. 276 


i] 
i) 


4 
oO 


oorooo © 


i 


wooroo : 


wo emooooo rwooooo 


t 
a 


coroooo woooco rooooco ~ooco 


0 
0 


. 508 


© 


co 
i) 


fr) 


rotated axes with respect to the old axes. 


In the first rotation, axes I, II, and III of 
F, were plotted together on one hemisphere 








of the globe. The direction cosines of the 
rotation indicated for this subspace are listed 
in the intersections of rows 1, 2, and 3 with 
columns 1, 2, and 3 in partial rotation G,, of 
Table VII. By reflecting all tests in F, 
which have negative projections on cen- 
troid factor IV and extending the lengths of 
the partial vectors in the 4, 5, 6 subspace of 
F, to unity, the configuration of the 19 points 
in the 4, 5, 6 subspace was plotted on the 
other hemisphere of the globe. Again moving 
the wire triangle to the position of best fit, 
the partial rotation indicated by the direction 
cosines shown in the intersections of rows 4, 
5, and 6 with columns 4, 5, and 6 in G,, 
could be written. G,,; then comprises the first 
partial rotation of F, and was applied to 
transform F, into partially rotated V,,. The 
elements in the first column of G,, consist of 
the direction cosines of the new axis /’ with 
respect to the centroid axes J through V/, the 
second column lists the direction cosines of 
II’ with respect to the six centroid axes, etc. 
That G,, is orthogonal within negligible devi- 
ations from zero due to small errors in meas- 
uring angles on the surface of the sphere can 
be verified by computing the matrix product 
G,:’G,, in which the diagonal elements are 
unity and the non-diagonal elements are 
zeros. The application of partial transforma- 
tion G,, reduced the index of fit ISF from 
2.073 to 1.935. 

The second partial rotation was accom- 
plished by plotting the 1, 3, 6 subspace of 
V,; on one hemisphere of the globe and the 
2, 4, 5 subspace on the other hemisphere. The 
resultant partial transformation G,, is listed 
in Table VII. The application of this par- 
tial rotation reduced the index of fit ISF to 
1.850. 

After these first two partial transformations 
based on three-dimensional sections, rotations 
were carried out in two-dimensional sections. 
This was done after the plotting of several 
three-dimensional sections of V,, showed no 
apparent major rotations, and it appeared 
that subsequent improvements of fit would 
call for smaller rotations that could be accom- 
plished more expediently in two-dimensional 
sections.”* 


anak Gee pens a a perfect sphere was an oblate 
This fact necessitated some smal] adjustments of the 
ak. after the angles between new and old axes 
had been measured in order to render G,, es nearly orthogonal 
as possible. For large rotations the additional work of adjust- 
ing the direction cosines was tolerable, but for smaller rota- 
tions orthogonality could be more simply maintained in two- 
dimensional rotations. . 





Deaember, 1945] FACTOR ANALYSIS OF ABILITIES 


TABLE VIII 
Sets oF AXEs ROTATED IN EACH PARTIAL ROTATION OF F’, AND Vai 
Axes 
III IV 


Partial Rotation 


_— 
_ 


re 
ij 


bende! Dd! 


¥ aie, 
Interchange of columns 


Gag 


ee dS) 


The sets of axes which were rotated to- 
gether in each partial rotation are listed in 
Table VIII. Axes rotated as a set are labeled 
with the same letter in the corresponding row. 
Cells which are empty indicate axes for which 
no apparent rotation was indicated in the 
given G,;. All partial rotations G,; are listed 
in Table VII. 

In the course of rotating pairs of axes, it 
occasionally happened that the writer selected 
a rotation which led to an increase in the 
index ISF. Such rotations were discarded. 
Each rotation for a set of axes indicated in 
the above table led to a decrease of ISF for 
that set. The last column in the table lists 
the index of structure fit ISF after each par- 
tial rotation. Although the index decreased 
consistently, the amount by which it dimin- 
ished per rotation decreased as more rotations 
were made. After rotation G,, no further 
improvements of fit could be found, although 
several were tried. The matrix G,, was 
applied merely to interchange the columns of 
the factorial matrix to facilitate subsequent 
discussion. 

The product of the nine partial rotations 
G,, results in the single transformation matrix 
G, which will rotate F, into the final rotated 
matrix directly; i.e., 


G, = Gu,GasGazGay GusGagGazGasGag 


This transformation matrix is listed in Table 
IX. In the lower half of Table IX, the matrix 
product G,’G, is given. If G, is orthogonal, 
this product should have unity in all diagonal 
cells and zeros elsewhere. Inspection of the 
product G,’G, reveals that all diagonal en- 
tries are unity if rounded to two decimal 
places, and all but four of the fifteen distinct 
non-diagonal entries round to zero for two 
decimal places, the four exceptions rounding 


to .o1. Since arc cos .or == 89° 26’, the four 
non-zero entries indicate pairs of axes that 
are approximately one-half degree off from 
perfect orthogonality. For all practical pur- 
poses, then, the transformation matrix G, can 
be considered orthogonal. 


TABLE IX 


THE ORTHOGONAL TRANSFORMATION MATRIX G, 
AND THE Matrix Propuct G,’'G, 


.889 .461 .016—.023—.050—. 033 
. 270—. 575—. 571—. 293—.. 421—-. 034 
.259—.476 .371—.376 .557 .340 
—.126 .326—.703—.206 .555 . 
.195—. 330—.153 .561 .444—. 
-110—.127—.131 .644—.078 


-996 .012 .000 .009—.006—. 

.012 1.001—.003 .003—.002—. 

.000—.003 .999 .002—. 

-009 .003. .002 .999 . 
—.006—. 002—.002 .000 1.001 
—.003-—.003—.003 .001 .010 1.002 








The application of G, to F, produces V,, 
the rotated orthogonal common factor matrix 
which it was the purpose of this analysis to 
isolate. V, is listed in Table X, and has been 
modified from the product F,G, in that tests 
(10), (14), (15), and (16) have been re- 
flected; i.e., all signs in these rows have been 
reversed. These test reflections were under- 
taken in order to render the corresponding 
test projections essentially positive. 

The index of fit ISF for V, is 1.764 as 
against the poorer fit of F, for which ISF 
equalled 2.073. In discussing the extent to 
which the number of zeros have been max- 
imized, it is first necessary to define what is 
to be taken as a near zero. Thurstone”* calls 
factor loadings whose absolute values are 
equal to or less than .2 near-zeros, since such 
loadings correspond to only 4% of the total 


**L. L. Thurstone, Primary Mental Abilities, p. 79. 











JOURNAL OF EXPERIMENTAL EDUCATION 


14, No.2 


TABLE X 


THE ROTATED ORTHOGONAL FACTORIAL MATRIX V, 


Test Code 


. 765 
. 689 
. 773 
. 789 
. 182 
.173 
. 554 
.114 
. 505 
. 369 
. 350 
-171 
. 095 
- 664 
- 726 
. 592 
. 303 
TPPT . 025 
PGTA (36-37)__......  .648 


Common Factor 


GKMA TRS PESA EATP 


. 090 —. 081 
- 105 . 152 
. 160 —.137 
. 270 —. 052 
. 499 _—. 

. 685 

. 322 

. 672 

.014 

. 000 

. 062 

. 760 

. 140 

. 160 

. 104 

. 051 

. 089 

. 380 


—.109 : 308 


Sum of Squares 2.78 2.28 
* Test reflected after the product FaGa was obtained. 


test variance. The writer has arbitrarily 
chosen to designate as near-zeros any load- 
ings that account for less than 5% of the 
total test variance; i.e., loadings whose abso- 
lute values are less than .224. The number of 


near zeros in V, then is 73 while F, had only 
52 near zeros. Thus, the rotation of F, led 
to an increase of 21 in the number of zeros. 
The 73 zeros represent 64% of the 114 cells 
in V,. 


INTERPRETATION OF THE FACTORIAL 
Matrix V, 


The common factors—The interpretation 
of the common factors in V, can be made by 
analyzing each column of V, to discover 
which tests have significantly large loadings 
on the factor and which have near-zero load- 
ings. Regarding the minimum loading which 
shall be taken as significant in naming a 
factor, Thurstone** writes, “We have not re- 
garded a projection as significant in naming 
a factor unless it is as large as .40. The 
naming of a factor cannot be made with con- 
fidence unless the projections are as large as 
.50 or .60 so that the factor accounts for a 
fourth or a third of the variance of a test.” 
The writer has chosen a factor loading of .50; 
ie., 25% of the total test variance, as the 
minimal loading to use in identifying the 
common factors. Thus, zero loadings are those 

280 L. L. Thurstone, Primary Mental Abilities, p. 79. 


which account for less than 5% of the total 
test variance and significant loadings are 
those that represent at least 25% of the test 
variance. The resultant factor pattern is re- 
produced in Table XI where X’s indicate sig- 
nificant and O’s indicate zero loadings in the 
above sense. Cells left blank indicate inter- 
mediate values. None of the significant load- 
ings is negative,** indicating that the 19 abil- 
ities are essentially contained in the positive 
manifold. The complexity of all tests except 
(6a) is one, and (6a) has a complexity of 
two. The first four common factors have 10, 
3, 4, and 2 significant loadings respectively. 
Common factor 5 has no significant loading 
on it, and factor 6 has only one significant 
loading. 

Before interpreting the common factors, it 
is important to consider the direction of each 
of the tests as they appear in Tables X and 
XI. It will be recalled that tests (13) and 
(15) were reflected prior to the extraction of 
the first centroid factor and that tests (10), 
(14), (15), and (16) were reflected in writ- 
ing V, from F,G,. Since test (15) was twice 
reflected, it now has its original sense in 
which high scores indicate teaching rather 
than administrative tendencies. The reflection 
of (14) makes high scores desirable; i.., 
high scores now indicate sociability. The re- 

In V, only six loadings are less than —.224, the smallest 


of these being —.364 which corresponds to 13% of the test 
variance. 





No. 
(1) 
(2a) 
(12) 
(8) 
(13) 


(14) 
(4) 
(5) 

(6a) 

(Ta) 

(8a) 

(15) 

(16) 

(17) 

(18) 
(9) 

(10) 

(19) 

(lla) 


flectic 
of re: 
encies 
scores 
origin 
tive ¢ 
of th 
colun 
of ea 


December, 1945] FACTOR ANALYSIS OF ABILITIES 


TABLE XI 


THE’ COMMON FACTOR PATTERN OF 19 TEACHER ABILITIES IN TERMS OF ZEROS 
AND SIGNIFICANT LOADINGS 


Ability Common Factor High Scores 
No. Code PESA EATP Indicate 


(1) Intelligence 
(2a) 


pee a 
(12) Knowledge 
(3) 


Knowledge 

(13) Wholesome self- 

confidence 
Sociability 
Leadership 
Social adjustment 
High ratings 
High ratings 
High ratings, 
Teaching 
Research 
Informed liberalism 
Knowledge 
Knowledge 
Eulogizing attitude 
Sound practices 
Large pupil gains 


flection of (16) makes high scores indicative social studies and the two intelligence tests. 
of research rather than administrative tend- It would appear, therefore, that this factor is 
encies. The reflection of (10) makes high concerned with the teacher’s general knowl- 
scores correspond to the eulogizing attitude edge in a great variety of fields and her gen- 
originally intended by Yeager to be indica- eral intelligence or mental maturity. The 
tive of good teachers. The direction of each factor has been named: General Knowledge 
of the 19 tests in V, is indicated in the last and Mental Ability (GKMA). The tests in- 
column of Table XI where the high score end volving the teacher’s personal, emotional and 
of each trait continuum as it now stands is_ social adjustment, the two SEAT measures 
briefly labeled. and the Teaching Problems and Practices 

The first common factor: (GKMA) gen- Test are statistically independent of this 
eral knowledge and mental ability—The tests factor. The appearance of test (4) MTIL 
having significant and zero loadings on the with a significant loading on factor GKMA 
first common factor are as follows: indicates that leadership as defined by Morris 


Significant Loadings Zeros 


(3) (5) WSAI 
(12) (18) BPI (Fe) 

(1) (14) BPI (Fs) 
(18) (15) SEAT (T-A) 
(2a) Lowe tad (16) SEAT (A-R) 

TPPT 


(17) s (19) 
(lla) 


(9) 
(4) 
(6a) 


(14) 
(4) 
(5) 

(6a) 

(Ta) 

(8a) 

(15) 

(16) 

(17) 

(18) 
(9) 

(10) 

(19) 

(lla) 


‘ 


COCOCOKMOOOM! & HM! COO 


°° 


' ©: COOOSO 


MMO! COOOOOOOOD COOSSO 
00000! KOOOOSOO CO! OOg 


0! COOOCOOMM> 


ooo! 


Aside from (4), (11a), and (6a), the tests in this test calls for responses similar to those 
having significant loadings on this factor are of widely informed and intelligent persons. 
concerned with the teacher’s knowledge of The loading of .65 of (11a) PGTA indicates 
subject matter, general knowledge, under- that the teacher whose pupils gain most in 
standing of the principles of mental hygiene, information, skill, and attitudes in the field 
mental ability, and informed liberalism. The of social studies is the teacher who has much 
five tests which are dependent upon this of common factor GKMA. Only one of the 
factor for at least 48% of their test variance teacher-rating scales has a significant loading 
include the two tests of subject matter in the on this factor; ie., .50, but a much larger 





182 


part of the variance of this rating scale is 
attributable to the second common factor. 
The second common factor: (TRS) teacher 
rating scale factor—The significant and zero 
loadings for this factor are as follows: 


Significant Loadings 
(8a) 
(Ta) 


(6a) ae 82 


This common factor clearly is related to 
supervisory ratings of teachers and to little 
else. Only one other loading exceeds .4; 
namely, (19) TPPT, .405. This indicates that 
evaluations of teachers based upon rating 
scales are unrelated to the amounts pupils 
learn, unrelated to the teacher’s knowledge 
of subject matter, unrelated to the teacher’s 
mental ability, and unrelated to the social 
and emotional adjustment of the teacher. 
This Teacher Rating Scale factor (TRS) de- 
pends perhaps principally upon whether the 
rater likes the teacher or not. Whatever it is 
that the rater bases his ratings on, the ratings 
clearly seem to be unrelated to quite a vari- 
ety of teacher traits or the progress of pupils. 
Since the ratings seem to be independent of 
the teacher traits measured or the growth of 
her pupils, it is possible that they are deter- 
mined predominantly by the beliefs, atti- 
tudes, whims, or idiosyncrasies of the rater as 
he observes minor, irrelevant, and extraneous 
details of the teacher or her pupils in the 
classroom. 

The third common factor: (PESA) per- 
sonal, emotional and social adjustment.—The 
significant and zero loadings on this factor 
are: 

Significant Loading; 
(15) 
( 14 ) 
(5) 
(13) 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 
Three of the four significant loadings on this 


factor are concerned with the teacher’s per- 
sonal adjustment within herself. The high 
loading of (15) SEAT(T-A) on this factor 
suggests that high teaching scores are made 


Zeros 


(2a) TCPE (1934) 
(12) AORM 
(3) ACCGT 
(14) BPI (Fs) 
(5) WSAI 
(15) SEAT (T-A) 
(16) SEAT (A-R) 
(17) SASST (Tot.) 
18) TPMH 
(9) OTFAE 
(10) ATTP 
(lla) PGTA (36-37) 


on this test by persons who are emotionally 
and socially well-adjusted. The zero loadings 
of the rating scales indicate that the person- 
ality of the teacher as she herself feels is 
unrelated to the evaluation the rater makes 
of the teacher. This factor likewise appears 
to be independent of the teacher’s knowledge 
or her mental ability. On the basis of the 
above loadings, it appears reasonable to call 
this factor the Personal, Emotional and Social 
Adjustment (PESA) of the teacher. 

The fourth common factor: (EATP) eulo- 
gizing attitude toward the teaching profes- 
sion.—Only two of the tests have significant 
loadings on this factor: i.e., 


(10) ATTP .77 
(19) TPPT .56 


Fifteen of the remaining seventeen loadings 
are in the zero range, and the remaining two 
are less than .4 in absolute value. This factor 
is thus best characterized by test (10) ATTP 
which is simply a scale of attitudes toward 
the teaching profession. The positive loading 
of this test indicates that this factor repre- 
sents a Eulogizing Attitude Toward the Teach- 
ing Profession (EATP). The significant load- 
ing of (19) TPPT on this factor indicates 


Zeros 
(1) ACPE 
(2a) TCPE (1934) 
AORM 


(12) 
(6a) TDRS (E) 
(Ta) ASRS (E) 
(8a) MTRS (E) 
(16) 

(17) 
(18) 

(9) 
(10) 





December, 1945] 


that teachers ing this eulogizing atti- 
tude tend to handle pupil behavior problems 
in a manner consistent with the principles of 
sound mental hygiene. Persons possessing this 
factor to a high degree might be described as 
possessing an attitude “Teachers are good, 
pupils are good” whereas the attitude of those 
not possessing it might be described by the 
phrase “Teachers are bad, pupils are bad.” 
Thus, this factor is perhaps concerned with a 
basically sympathetic attitude toward people. 

The fifth factor—None of the tests has a 
significant loading in the fifth column of V,. 
Thirteen of the 19 loadings are zeros. The 
factor represents only 5.5% of the total test 
variance of all of the tests. The only two 
loadings exceeding .4 are found for (1) 
ACPE: .45 and (2a) TCPE (1934): .49 
suggesting that this factor may be related to 
a portion of the intelligence tests not 
accounted for by the general information 
aspect of common factor GKMA. Perhaps, 
the factor is concerned with native ability, 
but since the loadings are not sufficiently high 
to name the factor with any confidence, it 
shall remain unnamed here. 

The sixth factor—One significant loading 
only is found on this factor; ie., (16) 
SEAT(A-R), .58. Sixteen of the remaining 
18 loadings are zero. The factor accounts for 
only 4.4% of the variance of the 19 tests. 
This is the only factor upon which (16) 
SEAT(A-R) has a significant loading. This 
indicates that tendencies toward research 
rather than administration as measured by 
this test are unrelated to the common factors 
GKMA, TRS, PESA, and EATP. The tend- 
ency toward research measured by this test 
thus has little in common with the other 18 
measures. For this reason, this factor does 
not appear to be a part of the common factor 
solution of this battery of tests. No attempt 
will therefore be made to interpret or name 
it as a common factor in this analysis. 

Common factor conclusions—From the 
foregoing it appears that only four common 
factors emerge from this analysis with suffi- 
cient clarity for identification. The remaining 
two appear to be either specific factors or 
residual error planes. It will be recalled from 
section III that four common factors were 
predicted by the McNemar criterion of when 
to stop factoring. Consequently, the first 
four columns of V, probably represent the 
meaningful common factor pattern underly- 


FACTOR ANALYSIS OF ABILITIES 


183 


ing the correlation matrix R,. Since these 
four common factors are orthogonal to each 
other, they are statistically independent in 
the teacher sample upon which this study is 
based. 

The common factor structure of the 19 
abilities and the 4 common factors represents 
a positive orthogonal simple structure as de- 
fined by Thurstone.* This follows since (1) 
all significant factor loadings are positive, 
(2) the common factors are mutually ortho- 
gonal, (3) each row of V, (Cols. 1-4) has at 
least one zero, (4) each column of V, (Cols. 
1-4) has at least 4 zeros, and (5) for every 
pair of columns of V, (Cols. 1-4) there are 
at least 4 traits whose loadings vanish in one 
column but not in the other. 

Since there are only 19 significant loadings 
of the several abilities on these four common 
factors, only 19 non-zero numbers need to be 
retained by the mind in comprehending the 
interrelationships of the 19 abilities for the 
24 teachers. Thus, what was originally repre- 
sented by 24 X 19 == 456 teacher scores, 
then reduced to 171 correlation coefficients, 
can now be grasped in terms of 19 non-zero 
entries arranged in four columns. The 19 
teacher abilities need no longer be thought of 
as 19 interrelated traits but can be consid- 
ered in terms of four independent common 
factors. 

The tests—Since the complexity of all 
tests but one is unity, the question of which 
common factors are measured by each test 
has already been answered in Table XI 
where the significant loadings under each 
common factor indicate the common factor 
measured by those tests. Each row of Table 
XI shows the factor composition of one test. 
The only measure of complexity two is (6a) 
TDRS(E) for which 67% of the variance is 
based on factor TRS and 26% on GKMA. 
For the remaining two rating scales, these 
percentages are (7a) ASRS(E), 75% and 
14%; and (8a) MTRS(E), 80% and 12% 
respectively. Thus, it is seen that, although 
the rating scales measure predominantly the 
factor TRS which is common to only the 
rating scales, there is a consistent trend for 
a fraction of the remaining variance to be 
associated with the teacher’s general knowl- 
edge GKMA. 

The pupil gain criterion of teaching ability 
(11a) PGTA (36-37) has a significant load- 


*L. L. Thurstone, Vectors of Mind, pp. 150-166. 





184 


ing on only the GKMA factor, indicating 
that in this study teachers with much general 
knowledge in many fields plus above-average 
mental ability are the best teachers in pro- 
moting pupil growth. Measure (11a) PGTA 
(36-37) on the other hand has a zero load- 
ing on the rating scale factor TRS. These 
findings indicate teacher-rating scales to be 
a very poor substitute for the measurement of 
pupil gain as far as getting at the real out- 
comes of education is concerned. 


CoNCLUSIONS 


This analysis indicates that four common 
factors are sufficient to account for the inter- 
relationship of the 19 teacher abilities inves- 
tigated. These four factors can be termed the 
primary teacher factors underlying this bat- 
tery of teacher abilities. The four primary 
factors are mutually independent of each 
other in the sample of 24 teachers upon which 
this study is based. 

The four orthogonal primary traits are: 


GKMA, General Knowledge and Mental 

Ability , 

TRS, Teacher Rating Scale Factor 

PESA, Personal, Emotional and Social 
Adjustment 

EATP, Eulogizing Attitude Toward the 
Teaching Profession 


All but one of the 19 teacher abilities are of 
complexity one with respect to these four 
factors. The factor with which each test is 
highly saturated is summarized in Table XI, 
page 181. 

Teacher-rating scales as used here have 
little in common with any of the other teacher 
abilities measured, including the ability of 
the teacher to promote pupil growth. The 
rating assigned to a teacher on one of these 
scales is dependent either upon teacher traits 
not measured here, upon characteristics of 
the rater rather than the teacher, or upon an 
interaction of these. The ratings are not 
acceptable as a substitute for pupil growth in 
evaluating the educative process. 

The pupil-gain index of teaching ability 
PGTA is dependent upon only the GKMA 
factor in this study. This indicates that the 
better teachers tend to be more generally in- 
formed, of greater mental ability, possessing 
more knowledge of their subject matter, lib- 
erzl rather than conservative, understanding 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


the principles of mental hygiene more fully 
and possessing trait MTIL. The factor load- 
ing .648 of PGTA on GKMA, however, 
accounts for only 42% of the total variance 
of teaching ability as measured by this index, 
indicating that other teacher factors, pupil 
factors, or measurement errors in this study 
operate to account for an equally large share 
of the variance of pupil development. 
Twenty-two percent of this variance is dis- 
tributed among the other five factors in V,, 

9% under common factor PESA, and 71% 
dao factor EATP. Whether these amounts 
are significant or attributable solely to sam- 
pling errors can be answered only by subse- 
quent investigations. 


SECTION IV 


ANALYSIS OF STUDY NUMBER TWO: 
THE 1937-38 WISCONSIN STUDY 
OF TEACHING ABILITY 


The intercorrelation table of teacher abil- 
ities to be analyzed in this section is based 
on data gathered for the 1937-38 Wisconsin 
study of teaching ability. This study was the 
second of the series already described by 
Rolfe, pages 52-74. While Study Number 
One dealt with teachers in rural state graded 
schools, Study Number Two is based chiefly 
on teachers in one room rural schools, teach- 
ing the social studies in combined 7th and 
8th grades. In Study Number One, PGTA 
was based on gains of pupils at the eighth 
grade level only. In Study Number Two the 
pupil gain index of the ability of these teach- 
ers is based upon the progress of their pupils 
at both grade levels combined. In all other 
respects the two studies were intended to be 
parallel studies. Most of the measures of 
teacher abilities used in Study Number One 
were used with the teachers of Study Number 
Two with several measures added. For a de- 
tailed description of this study and the several 
measures employed see Rolfe’s** report. A 
list of the teacher tests with their key num- 
bers is given below: 


(1) ACPE, American Council on Educa- 
tion Psychological Examination for 
College Freshmen, 1936 Edition. 

(2b) TCPE (1938), Psychological Exam- 
ination (Teachers College), Form C, 
1938 Edition. 


‘The —— of Teachi 
ay “Number ae, Journal of 
V (September, 1945), pp 52-74. 


Ability: 
Experimenta! ation, 





December, 1945] 


(3) 


(4) 
(5) 


(6b) 


(7b) 


(8b) 


(9) 


(10) 


ACCGT, American Council Civics 
and Government Test, Form B. 
MTIL, Morris Trait Index L. 
WSAI, Washburne—The Social Ad- 
justment Inventory, Sapich Edition. 
TDRS (Av.), Torgerson Diagnostic 
Teacher Rating Scale of Instructional 
Activities. The rating used here is the 
average of the ratings assigned inde- 
pendently by (1) R. E. Gotham, 
(2) the supervising teacher, and 
(3) the county superintendent. 
ASRS (Av.), Almy-Sorenson Rating 
Scale for Teachers: Average of three 
ratings as described under (6b). 
MTRS (Av.), Michigan Teacher 
Rating Scale. Average of three 
ratings as described under (6b). 
OTFAE, Lewerenz—Steinmetz—Ori- 
entation Test Concerning Fundamen- 
tal Aims of Education, 1935 Revision. 
ATTP, Yeager—Scale for Measuring 
Attitude Toward Teachers and the 
Teaching Profession. 


(11b) PGTA (37-38), Pupil Gain Index 


(20) 
(21) 


of Teaching Ability in Social Studies 
at the Seventh and Eighth grade 
levels. 

BPI (Bn), Bernreuter Personality 
Inventory: Neurotic Tendency. 

BPI (Bd), Bernreuter Personality 
Inventory: Dominance. 


TAB 


FACTOR ANALYSIS OF ABILITIES 185 


(22) BPI (Bs), Bernreuter Personality 
Inventory: Self-Sufficiency. 

(23) PFRS (Av.), A Scale for Evaluating 
the Personal Fitness of Teachers. 
Average of three ratings as described 
under (6b). 

(24) PRS (Av.), Personality Rating Scale. 
Average of three ratings as described 
under (6b). 

(25) SASST (Inf.), Hartman — Social 
Attitudes of Secondary School Teach- 
ers test: Part III. 

(26) SASST (Att.), Hartman—Liberalism, 
the score on Social Attitudes of 
Secondary School Teachers test: 
Part I. 

(27) SOCB, Wrightstone—Scale of Civic 
Beliefs. 

(28) TTPR, Torgerson—A Test of 
Teacher-Pupil Relationship. 


The correlation table —The twenty teacher 
measures described by Rolfe and listed above 
were correlated with each other in the follow- 
ing manner: (1) To render all scores positive, 
an appropriate constant was added to each 
set of scores that had negative entries; (2) 
the scores for each teacher were punched in 
an 80 column IBM Hollerith card; (3) the 
sums, sums of squares, and sums of cross 
products were obtained by the progressive 
digiting method by running the cards thru 


LE XII 


THE CORRELATION MATRIX Ry: INTERCORRELATION TABLE OF 20 TEACHER ABILITIES 


Test No. 


(1) 


BASED ON 


47 TEACHERS 


(Study Two) 


Code 


No. (1) (2b) (3) (25) 


. 127 - 633 . 123 
3 . 444 . 234 
. 444 ee . 422 
. 234 - 422 2 
- 094 . 281 . 233 
. 067 . 186 . 255 
. 249 . 313 . 265 
. 135 . 624 . 063 
. 283 . 354 . 189 
- 087 . . 148 
. 002 , . 123 
. 065 : -111 
. 126 ‘ . 103 
. 155 : . 241 
. 481 : .311 
. 307 ‘ . 125 
. 442 . .421 
. 074 . —. 286 
.211 ‘ —. 028 
.050 —.012 . 005 





JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


TABLE XII—Continued 


—. 082 


—. 053 


—. 005 
—. 167 


(4) (5) 
. 486 - 422 
- 135 . 283 
- 624 . 854 
. 063 - 189 
. 184 - 105 
102 —.041 
—. 048 
i . 259 - 057 
. 259 ( ) - 457 
- 057 . ) 
. 129 . 901 
. 079 . 839 
. 057 - 891 
. 075 . 855 
090 . 874 
. 193 
. 087 
. 030 
. 021 
. 852 


(23) 
. 049 
. 087 
. 086 
- 148 
-110 
. 089 
. 080 


(6b) (7b) 
041 
- 126 
- 109 
- 108 


. 543 
. 130 


TABLE XII—Continued 


an IBM numerical tabulator; and (4) the 
correlation coefficients were computed by sub- 
stitution in the formula: 


sXY — X3Y 
VaX?— X3X Vzsy?— Ysy 





xy = 





where X denotes the mean of the X’s and Y 
denotes the mean of the Y’s. 

The resultant table of intercorrelations is 
reported by Rolfe.” 

In order to maximize the number of posi- 
tive correlation coefficients in the matrix to 
be factored here, variables (10)—ATTP, 
(20)—BPI (Bn), (23)—PFRS (Av.), and 
(24)—PRS (Av.) were reflected. The cor- 


2b Ibid., pp. 66-67. 


—.124 


(26) 
. 829 
- 481 
- 234 
-311 
- 285 
. 251 
. 234 
. 090 
. 319 
. 374 
. 319 
. 825 
. 380 
- 438 
2 
201 
. 348 


(27) (9) 
. 285 . 708 
. 307 . 442 
. 295 . 677 
. 125 . 421 
.176 . 134 
.011 
. 155 
. 543 
. 834 
. 087 
. 040 
.099 
. 062 
j .122 
. 201 . 848 

eS . 286 

. 286 ( ) 
—.178 

. 332 

. 285 


. 094 
. 074 
.114 
—. 286 
. 102 
-118 
. 134 
. 130 
. 259 
. 030 
. 044 
. 041 
—. 026 
—.025 
—.124 
—.178 , ‘ 
-077 174 —.061 
- 077 ( ) : —. 221 
-174 . 264 ._2 . 222 
—.061 —.221 . 222 _ 


- 169 
. 384 


relation table with these four measures re- 
versed in direction from that described in 
4.14 is presented in Table XII. In this cor- 
relation matrix, R,, 50 of the coefficients are 
negative. Since the matrix is symmetric, this 
means that 25 of the 190 side entries on one 
side of the diagonal are negative. The fre- 
quency distribution of the 190 different cor- 
relation coefficients follows in Table XII) 

The algebraic mean of the 190 side entries 
in R,, obtained by addition of the individual 
entries, is .210, and the average of their abso- 
lute values is .222. 


Tue Factor ANALYSIS 
The centroid factorization—The centroid 
factorization of R, was carried out in pre- 





5 
0 
2 
5 
5 
5 
d 
7 
7 
4 
4 
5 
) 
) 
; 
) 
| 


December, 1945] 


TABLE XIII 


FREQUENCY DISTRIBUTION OF THE INTER- 
CORRELATIONS OF THE TWENTY 
TEACHER 


Class Interval , 
From f 
10 
6 
14 
51 
84 
23 
2 


Total 190 


J 


NWCONHEAWS SO 


| 


cisely the same manner as that described for 
R, in section III with the exception of the 
manner of sign changing when reflecting a set 
of residuals prior to the extraction of the next 
factor. The reflection of measures (10), (20), 
(23), and (24) described above rendered all 
of the B’s for R, positive. After the extrac- 
tion of the first and all subsequent centroid 
factors, the manner of sign changing was to 
(1) compute the B’s for each column of the 
residual table, (2) first reflect that test for 
which B was most negative, (3) compute new 
B’s for all columns of the residual table, 
(4) reflect whichever test then had the most 
negative B, and (5) continue this process 


TABLE XIV 


SIGN CHANGING TABLE FOR CENTROID 
FACTORIZATION OF Ry 


Reflection* of Residuals 


after Extracting Factor 
II III IV 


x 
x 


| DADADIDS DD dd 
! >< 


| pabdbd ede >d 


(11b) eae St: Va a ties 
_ * The X’s indicate the tests that were reflected 
in the factor residual tables. 


FACTOR ANALYSIS OF ABILITIES 187 


until the B’s for all columns of the table were 
positive. Table XIV shows which tests were 
reflected in each table of residuals. 

Again, as in Study Number One, the Mc- 
Nemar criterion of when to stop factoring 
was used. Table XV lists the standard devia- 
tion of the residuals, the mean communality 
of the tests, and the value of o; after the 
extraction of each of the six factors. 


TABLE XV 


McNeEMAR INDEX OF WHEN TO STOP FACTORING 
(Study B) 
After S. D. of Mean Com- 
factor _ residuals munality oT 


- 168 . 262 . 228 
.119 .421 - 205 
. 086 . 528 . 182 
. 068 . 590 . 166 
. 052 . 644 - 146 
. 041 . 675 . 126 


Since the correlations in this study are 
based upon a sample of 47 teachers, the crit- 
ical value to which o; must reduce when all 
common factors have been removed is 1/\/ 47 
= .146. It is seen that o; drops to this value 
after the extraction of the fifth factor, indi- 
cating that five common factors are sufficient 
to reproduce the correlation matrix, R,, with- 
in discrepancies that can reasonably be attrib- 
uted to sampling errors. Although but five 
common factors are thus indicated, the sixth 
factor was also retained to provide more de- 
grees of freedom in the process of rotation. 

The resultant centroid factorial matrix, F,, 
is reproduced in Table XVI. The first column 
of Table XVI designates the test to which 
each row refers. The next six columns com- 
prise the factorial matrix F,. In the seventh 
column is listed the sum of squares for each 
test; i.e., the communality, A’. The last 
column lists the test reliabilities. The bottom 
row lists the sum of squares of the factor 
loadings for each column of F, which indi- 
cates the amount of the total test variance 
accounted for by each factor. Since these 
six entries add up to 13.50, which is 67.5% 
of 20, it follows that 67.5% of the total test 
variance is accounted for by these six cen- 
troid factors. 

As was pointed out in section III, the test 
reliability should equal or exceed the commu- 
nality. Inspection of the last two columns of 
Table XVI bears this out in all cases except 
for measures (23) and (24). The source of 





JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


TABLE XVI 


THE CENTRO Matrix F,: Factor LOADINGS OF 20 TEACHER ABILITIES ON SIX ARBITRARY, 
ORTHOGONAL, CENTROID COMMON FACTORS 


Centroid Factor 
IV vV 


II III 
—. 404 —. 521 
—.294 —. 277 
—. 496 —. 258 
—.191 . 202 
—. 358 . 426 
—.372 - 625 
—. 443 - 412 
—. 385 -175 
. 316 
. 276 
.172 
. 180 
. 258 
. 278 
. 043 
. 386 
. 391 
. 122 
? : . 448 
- 287 . 448 .018 


5.25 3.17 2.13 
* Noreliability coefficients available. 


(11b) 
Sum of 
Squares 


the reliability coefficients reported by Gotham 
for these measures was not cited. The com- 
munalities .94 and .g1 indicate that the reli- 
ability of these two measures in the popula- 
tion of teachers investigated here is more 
likely in the neighborhood of .go. 


The test communalities range from .37 to 
.94 with an average of .68. On the average, 
then, approximately two-thirds of the vari- 
ance of a test is accounted for by these six 
centroid factors, leaving only one-third to test 
specificity and sampling errors combined. 
Five of the communalities, however, are less 
than .50, indicating that more than half of the 
total variance for these tests is accounted for 
by factors not common in this battery of 
measures. 

Only five of the sixth factor residuals ex- 
ceeded .1 in absolute value. These were .102, 
107, .1I2, .123, and .153. This indicates that 
the centroid matrix F, can be accepted as one 
which reproduces the correlation matrix R, 
by the product F,F,’ well within the limits 
of sampling errors of correlation coefficients 
based on 47 cases. 

Rotation of the orthogonal reference frame. 
—tThe rotation of F, was undertaken in ex- 
actly the same manner as that described for 
F, in section III. The object again was to 


VI h? Rel, 
- 182 . 82 
. 3877 . 68 
. 256 . 82 
-111 . 37 
. 218 - 75 
. 096 
. 204 
- 227 
- 160 
- 146 
. 067 
. 090 
- 047 
. 079 
. 161 
. 082 
. 157 
. 236 
. 057 
. 196 


. 62 


maximize the number of near-zero factor 
loadings while maintaining the mutual ortho- 
gonality of the common reference axes. The 
index of structure fit, ISF, developed in sec- 
tion III was used at each step to determine 
which of several plausible rotations to use. 


For the first two partial rotations, three 
dimensional sections of the factorial matrix 
were plotted on the surface of the blackboard 
sphere as previously described. Four more 
partial rotations were arrived at by consider- 
ing all possible two-dimensional sections of 
the factorial matrix as plotted on graph paper. 


The sets of axes which were rotated to- 
gether in each partial rotation and the index 
of structure fit, ISF, after each partial rota- 
tion are listed in Table XVII. 


Axes rotated as a set in a given partial rota- 
tion are denoted by the same letter in the 
row corresponding to that rotation. Cells 
which are empty denote axes for which no 
apparent rotation was indicated in the given 
G»;. All partial rotation transformation 
matrices, G,;, are reproduced in Table XVIII. 

As indicated in Table XVII each partial 
rotation led to a reduction in ISF. The ISF 
listed for each partial rotation is the value 
obtained in the process of rotating the orig- 





December, 1945) FACTOR ANALYSIS OF ABILITIES 


TABLE XVII 
Sets or Axes RoTATED IN EACH PARTIAL ROTATION OF F 


ial ISF after 
Setation Rotation 
2.097 (2.093) 

. 768 

. 715 

. 651 

. 601 

. 598 


.589 (1.599) 
Interchange of columns .589 (1.599) 


TABLE XVIII inal factorial matrix.2* The values for ISF 

listed in parenthesis for the matrix F,, and 

THE PARTIAL ste ) sa, opened the final rotated matrix V, are the values 

which apply to the corrected factorial 

.623 .731 .276 0 0 matrices. After rotation G,, no further im- 

—i _ 9 0 : provements of fit could be found, although 

: : lk ‘onlle: 676 . several were tried. The matrix G,, was 

0 0 ; .197—. applied merely to interchange the columns of 

0 0 —. - T10—. the factorial matrix to facilitate subsequent 
discussion. 


0 

. 977 

0 ; 
. 216—. 
0 


. 721 

0 The product of the seven partial rotation 
; transformation matrices G,, yields the single 
637—. matrix G, which will rotate F, into the de- 
0 : : sired final factorial matrix V,, where 


Gy, == Gy, Gy, Gp; Go, Gos Go. Gu, 


°s 
on 
i 


This transformation matrix, G,, is given in 
Table XIX. In the lower half of Table XIX, 
the matrix product G,’G, is given. Since all 
of the diagonal entries round to unity and all 
but one of the fifteen distinct side entries 
round to zero (the exception rounding to 
.o1), if only two decimal places are consid- 
ered, the transformation matrix G, can be 
considered orthogonal for all practical pur- 
poses. 

The application of G, to F, on the right 
leads to the rotated orthogonal common factor 
matrix, V, == F,G,, which it was the purpose 
of this analysis to isolate. V, is reproduced in 
Table XX, page 190. 

The index of structure fit, ISF, for V, is 
1.599 while this index was 2.093 for the cen- 

% After the factor analysis of the originally reported corre- 





woo: 
& 


| 


or a 
ve) 
v2) 
n 

& & 


oF 


oe 
| see 


wo 


3 
-ooocoo 


errors led to new PGTA (37-38) correlation coefficients with 
the other teacher measures. The original centroid matrix was 
modified to conform to the corrected coefficients by a technique 
described by Dwyer in Psychometrika (September, 1937), 
pp. 173-178. The modified centroid matrix was subjected hn 
he original rotation transformation G,. Although the 
sultant ISF was not quite as good as po the original incor- 
rect solution, no further rotations were indicated. For a 
detail sry * 4 FP correction see sec- 
tions 9.1 and 9.2 in Appendix C of the original thesis. 








ooooor oeooes So 


oorooo os10° 








oroooco 





190 


troid matrix F,. Factor loadings that account 
for less than 5% of the test variance; i.e. 
whose absolute values are less than .224, will 
again be taken as “near-zero”. The number of 
near-zeros in V, then is 84 while in F, the 
number was 52. Thus the rotation of F, led 
to an increase of 32 in the number of zeros. 
The 84 zeros comprise 70% of the 120 cell 
entries in V,. 


TABLE XIX 


THE ORTHOGONAL TRANSFORMATION MATRIX G, 
AND THE MATRIX Propuct G,’G, 


-600 .702 .326 .155 .083 
—.508 .616—.498 .008 .342 .050 
—.522 .271 .636—.380—.325 .013 

-081 .237—.344 .082—.727—. 582 

.807 .041—.329—.822—.122 .327 

-108—.010 .122—.388 .478—.775 


-0083—.002 .001 .000 .001 .003 
—.002 1.004—.002 .000—.001 .001 
.001—. 002 1.000 .000—.001—. 003 
-000 .000 .000 1.001—.006 .000 
- 001—. 001—. 001—. 006 1. 001—. 002 


. 102 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


that account for 5% or less of the total test 
variance, i.e. loadings whose absolute values 
are equal to or less than .224. Significant load- 
ings will be those that account for at least 
25% of the total test variance, i.e. loadings 
equal to or greater than .50. The resultant 
factor pattern is reproduced in Table XXI, 
where X’s indicate significant and O’s indi- 
cate zero loadings in the above sense. Cells 
left blank indicate intermediate loading 
values. None of the significant loadings in 
V, is negative, indicating that the twenty 
teacher abilities are essentially contained in 
the positive manifold. 

Before interpreting the common factors, it 
is important to recall the direction each test 
has in V, and Table XXI. Measures (20) 
BPI (Bn), (23) PFRS (Av.), (24) PRS 
(Av.), and (10) ATTP were reflected from 
the sense they had prior to the centroid fac- 
torization. These four reflections are all that 
were made in any part of the analysis. Hence 














.003 .001—.003 .000—.002 1.004 || in V, and Table XXI these four measures 


TABLE XX 
‘ THE ROTATED ORTHOGONAL FACTORIAL MATRIX Vi 


GKMA 
. 859 
. 629 
. 841 
. 323 
-131 
. 008 
. 126 
575 
- 401 
011 
.012 
. 038 
.014 
. 098 
. 333 . 240 
. 351 ‘ .214 
. 868 ‘ . 043 
.013 ; . 088 
. 227 ‘ —. 047 
—. 062 . 021 


2.16 


Common Factor 
TRS PEA EAT 
. 022 —. 096 
.075 .072 
-111 . 248 
.174 . 273 
. 139 . 823 
.120 . 884 
. 010 . 638 
. 080 . 033 
. 449 .074 
. 959 . 031 
. 102 
.010 
. 037 
. 102 


TTPR 
PGTA (37-38) _____-_- 
Sum of Squares 


5. 00 


INTERPRETATION OF THE FACTORIAL are reversed in direction and all other 


Matrix V;, 


The common factors—The common fac- 
tors in V, can be interpreted by analyzing 
the teacher measures that have significant 
and near-zero loadings on each factor. As in 
section III near-zero loadings will be those 


measures have the same direction as that de- 
scribed by Rolfe. The direction of each 
measure as it enters into V, and Table XXI 
is indicated in the last column of Table XXI 
where the high score end of each trait con- 
tinuum is briefly labeled. 





December, 1945 ] 


FACTOR ANALYSIS OF ABILITIES 


TABLE XXI 


THE COMMON FACTOR PATTERN OF 20 TEACHER ABILITIES IN TERMS OF ZEROS 
AND SIGNIFICANT LOADINGS 


Common Factor 
PEA EATP PGTA 


GKMA _ TRS 


x 
x 


oooooooo 


o Oooco° 


PGTA (37-38) __- 


The first common Seis (GKMA) general 
knowledge and mental ability—The tests 
having significant and zero loadings on the 
first common factor are: 


Significant Loadings 


The four highest loadings on this factor 
involve the two intelligence tests and two 
subject matter tests. From 40% to 75% of 
the total variance of these four tests is 
accounted for by this factor. Three of these 
four tests have no significant loadings on the 
other five factors. It would appear therefore 
that this factor might well be termed: Gen- 
eral Knowledge and Mental Ability (GKMA). 
The significant loading of (4) MTIL on this 
factor suggests again, as in Study Number 
One, that leadership as defined by this test 
calls for the type of responses well-informed 
and intelligent persons make. 

Ten of the remaining fifteen measures in 
this battery have zero loadings on GKMA. 
These include the five rating scales, the three 
Bernreuter measures, ATTP, and PGTA (37- 
38). Hence supervisory ratings of teachers, 
the emotional adjustment of the teacher, and 
the amount the pupils learned appear to be 
— to the GKMA common teacher 
actor. 


High Scores 
Indicate 

Intelligence 
Intelligence 
Knowledge 
Knowledge 
Emotional si Stability 
Dominance 
Self-sufficiency 
Leadership 
Social adjustment 
High ratings 
High ratings 
High ratings 
High ratings 
High ratings 
Liberal attitude 
Liberal beliefs 
Knowledge 
Eulogizing o attitude 
Sound teaching 

practices 
Lowe pupil gains 


' C00 


oooooce: SCoooooo 


o OoOoc°o 


BPI (Bn) 
BPI (Bd) 
BPI (Bs) 
PFRS (Av.) 
PRS (Av). 
TDRS (Av.) 
ASRS (Av.) 
MTRS (Av.) 
ATTP 


PGTA (37-38) 


The remaining five measures had neither 
zero nor significant loadings on this common 
factor. These five measures are: (5) WSAI, 
(25) SASST (Inf.), (26) SASST (Att.), (27) 
SOCB, and (28) TTPR. The communalities 
of these tests in this battery, as listed in 
Table XVI are .37, .51, .43, .46, and .56 re- 
spectively. Only two other tests in this bat- 

have communalities less than .61, i.e. 
(22) BPI (Bs) and (11b) PGTA (37-38) 
which have communalities of .53 and .46 re- 
spectively. Reference to Table XXI for the 
five measures that have neither zero nor sig- 
nificant loadings on GKMA reveals that 
these are the five tests that have no signifi- 
cant loading on any of the six factors. The 
fraction of their total test variance that these 
measures have in common with this battery 
of tests is distributed among the six factors 
in such a manner that none of the loadings 
exceeds .460. In this analysis therefore these 
five measures can contribute but little to the 
identification of any of the common factors. 





JOURNAL OF EXPERIMENTAL EDUCATION 


The second common factor: (TRS) teacher 
rating scale factor—The measures that have 
significant and zero loadings on this factor 


Significant Loadings 
RPS aes ae 959 


ASRS (Av.)_..___............ 948 
le 923 


This common factor is characterized solely 
by the five supervisory rating measures that 
were applied to the teachers. Eighty percent 
or more of the total variance of each of these 
ratings is accounted for by this factor. The 
loadings of these five measures on the other 
five factors in V,, are all well within the near- 
zero range. Hence this factor is here called: 
Teacher Rating Scale Factor (TRS). 

Twelve of the measures in this battery have 
zero loadings on TRS. The remaining three 
loadings are: (5) WSAI, .449; (26) SASST 
(Att.), .386; and (11b) PGTA (37-38), .394 
indicating that roughly 16% of the variance 
of these three measures is related to TRS. 

Teacher rating scales, whether designed to 
measure teaching ability or personality, 
whether consisting of 6 items (PRS) or 68 
items (MTRS), apparently measure pretty 
much the same thing, and whatever this may 
be, it appears to be related to supervisory 
ratings of teachers and to little else. 

The third common factor: (PEA) personal 
emotional adjustment.—The significant and 
zero loadings on this factor are: 


Significant Loadings 
(21) BPI (Bd) 
(20) BPI (Bn) 
(22) BPI (Bs) 


[Vol. 14, No.2 


The only significant loadings on this com- 
mon factor are found for the three Bernreuter 
scales. This is the only factor on which these 
scales have significant loadings. Hence this 


Zeros 


(1) ACPE 
(2b) TCPE (1938) 
ACCGT 


(3) 
SASST (Inf.) 
BPI (Bn) 
BPI (Bd) 


factor is concerned with the teacher’s personal 
emotional adjustment as she feels within her- 
self. It has been named the Personal Emo- 
tional Adjustment Factor (PEA) of the 
teacher. High scores on this factor correspond 
to emotionally stable, self-sufficient, dominant 
persons as measured by BPI. Low scores 
would be associated with persons who are 
emotionally unstable, submissive, and fre- 
quently in need of encouragement and advice. 


This teacher personality factor apparently 
does not enter into ratings assigned to teach- 
ers by supervisors since all rating scales 
studied here show zero loadings on PEA. It 
likewise appears to be unrelated to teacher 
intelligence, knowledge of subject matter or 
any of the other measures in this battery. 

The fourth factor: (EATP) eulogizing 
attitude toward the teaching profession— 
Only the measure (10) ATTP has a signifi- 
cant loading on this factor, i.e. .621. The only 
other loadings that exceed .4 are for (5) 
WSAI, .420, and (28) TTPR, .433. Sixteen 
of the remaining seventeen loadings are in the 
zero range. It is doubtful that this factor 


Zeros 
(1) ACPE 
(2b) TCPE (1938) 
(4) MTIL 
(5) WSAI 
(23) PFRS (Av.) 
(24) PRS (Av.) 
(6b) TDRS (Av.) 
(Tb) ASRS (Av.) 
(8b) MTRS (Av.) 
(27) 
(9) 
(10) 
(28) TTPR 
(11b) PGTA (37-38) 


(10) 
(19) 


the v 
tially 
studi 
zero 

Henc 
attitu 
ingly 
Attit 
(EA’ 


Mor 
grou 
in a 
test 

T 
abil: 
fact 


(11b 
(28 
(27 


Sixt 





December, 1945) 


should be called a common factor in this 
battery. However, since test (28) TTPR 
closely resembles test (19) TPPT which was 
used in Study One and since the loadings of 
these two tests together with (10) ATTP 
closely resemble each other with respect to 
the fourth factor in each study, e.g.: 


FACTOR ANALYSIS OF ABILITIES 


193 


ships and whose social beliefs as measured by 
SOCB tended to be liberal rather than con- 
servative. The SOCB measure, it will be re- 
called, was also one of the eight social studies 
tests used to measure the pupil gain under- 
lying the PGTA (37-38) index of teaching 
ability. 


EATP Factor LOADINGS IN V, AND V> 


Study One 


—— 


peel Sicynanrs 1 .. 766 


0) = 
. viene 


(19) 


the writer is inclined to believe that essen- 
tially the same factor is involved in both 
studies. The Yeager scale (10) ATTP has 
zero loadings on all other common factors. 
Hence this factor is best characterized by the 
attitude that (10) ATTP measures. Accord- 
ingly this factor is again named Ewulogizing 
Attitude Toward the Teaching Profession 
(EATP) without, however, calling it a com- 
mon factor here. The loading of .433 of (28) 
TTPR on this factor once more indicates the 
tendency for teachers versed in the theory 
and practice of mental hygiene in dealing 
with pupils to likewise have a sympathetic 
attitude toward the teaching profession and 
teachers. Perhaps this factor is related to a 
general attitude of sympathy for all fellow 
beings, pupils and teachers as well as others. 
More measures of attitude toward various 
groups would, however, have to be included 
in a subsequent battery of teacher traits to 
test this hypothesis. 


The fifth factor: (PGTA) the teaching 
ability factor—The three loadings on this 
factor that exceed .4 are: 

——— 


(1lb) PGTA (87-38)_.....----- 
oo = ...-. 460 


(28) ; ae. 

ee 
Sixteen of the remaining seventeen loadings 
are in the zero range. This is the only factor 
on which the pupil gain index of teaching 
ability PGTA (37-38) has a significant load- 
ing in this study. Hence it is named The 
Teaching Ability Factor: PGTA, without, 
however, calling it a common factor here. 
The other two moderate loadings suggest that 
the classes that progressed most in social 
studies were taught by teachers who had a 
better working knowledge of mental hygiene 
as it manifests itself in teacher-pupil relation- 


Study Two 
(10) 
(28) 


TTPR 

The sixth factor —Only one loading on this 
factor is significant: (2b) TCPE (1938), .515. 
Measure (26) SASST (Att.) has a loading of 
.407, and (4) MTIL has a loading of —.416. 
Sixteen of the remaining 17 loadings are 
zeros. Both TCPE (1938) and MTIL have 
larger shares of their test variance accounted 
for by common factor GKMA while SASST 
(Att.) has its variance distributed over sev- 
eral factors with no loading in excess of .407. 
Although three loadings exceed .4 in absolute 
value, the writer can identify no quality that 
would account for these factor loadings on 
these three tests. This factor may well be 
made up chiefly of residual errors. No 
attempt will therefore be made to interpret 
or name it here. 

Common factor conclusions—Three com- 
mon factors definitely stand out in this anal- 
ysis, ie. GKMA, TRS, and PEA. If one is 
willing to consider loadings in the .4 to .5 
range as significant, the EATP and PGTA 
factors can also be considered common factors 
in this analysis. The sixth factor appears to 
correspond to a residual error plane. The 
McNemar criterion of when to stop factor- 
ing indicated that five common factors are 
required to account for the correlation matrix 
Ry. Hence it might well be said that the first 
five columns of V, constitute the meaningful 
common factor pattern underlying the corre- 
lation matrix R,. These five common factors 
are mutually orthogonal, ie. GKMA, TRS, 
PEA, EATP, and PGTA are uncorrelated 
with each other in the teacher sample upon 
which Study Number Two is based. 

The common factor structure of the 20 
teacher abilities and the 5 common factors 
falls short of being a positive orthogonal 
simple structure only in that factors EATP 
and PGTA do not have five pairs of trait 
loadings that are zero for one factor and sig- 





194 


nificant for the other, unless one is willing 
to accept loadings as small as .4 as being sig- 
nificant. The structure is however orthogonal 
and essentially positive. 

The first five columns of V, have only 15 
significant loadings that need to be retained 
to comprehend the interrelationships of the 
20 abilities for the 47 teachers. Thus, what 
was originally represented by 47 X 20 = 
940 teacher scores, then reduced to 190 cor- 
relation coefficients, can now be grasped in 
terms of 15 factor loadings arranged in five 
columns. The 20 teacher measures need no 
longer be though of as 20 interrelated traits 
but can be comprehended in terms of five 
independent common factors. 

The tests——Considering only the first five 
columns of V,, the complexity of each of the 
teacher measures with respect to the common 
factors is unity, except for the five measures 
that have no significant loading on any of 
the common factors. The common factor 
measured by eath test has already been 
treated and is summarized in Table XXI. 

The pupil gain index of teaching ability 
(11b) PGTA (37-38) has a significant load- 
ing only on the factor by the same name. 
The only other PGTA loading that is not in 
the near-zero range is found for common 
factor TRS where a factor loading of .394 
accounts for 15.5% of the total variance of 
PGTA (37-38). The loading of .530 on 
PGTA on the other hand accounts for 28% 
of the variance of this measure. It can be con- 
tended from these loadings that in this study 
a slight tendency of high teacher ratings to 
be associated with large pupil gains is shown. 
But on the whole, the amount seventh and 
eighth grade pupils progress in a one-room 
rural school in the field of social studies dur- 
ing one school year, appears to be unrelated 
to teacher factors GKMA, PEA, and EATP 
and only slightly related to the ratings as- 
signed to the teachers by their supervisors. 
This finding contrasts with that reported for 
Study One where PGTA (36-37) was found 
to be associated significantly with teacher 
factor GKMA. In the final section of this 
report a possible rationalization of this differ- 
ence will be presented. 


CONCLUSIONS 


This analysis indicates that five common 
factors are sufficient to account for the inter- 
relationships observed between the twenty 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14,No.2 


teacher abilities investigated. These five com- 
mon factors are mutually independent of each 
other and can be termed the primary teacher 
abilities underlying this battery of teacher 
measures. The five factors are: 


GKMA, General Knowledge and Mental 
Ability 

TRS, Teacher Rating Scale Factor 

PEA, Personal Emotional Adjustment 

EATP, Eulogizing Attitude Toward the 
Teaching Profession 

PGTA, Teaching Ability Factor 


The five teacher measures (5) WSAI, (25) 
SASST (Inf.), (26) SASST (Att.), (27) 
SOCB, and (28) TTPR had low communal- 
ities in this battery of teacher abilities and 
are not well defined with respect to the five 
common factors found in this analysis. The 
remaining teacher abilities are of complexity 
one with respect to the common factors. 
Table XXI summarizes the relationships 
found between the common factors and the 
tests. 


Teacher rating scales as used here have 
little in common with any of the other teacher 
traits and abilities measured, but show a 
slight relationship to the ability of the teacher 
to promote pupil growth in the field of social 
studies. 


The pupil gain index of teaching ability 
might more properly be termed a specific than 
a common factor in this test battery. Pupil 
gain in a one-room rural school appears to be 
unrelated to a host of teacher qualities. Spe- 
cifically, the amount pupils learn appears to 
be unrelated to the teacher’s general knowl- 
edge and mental ability, only slightly related 
to the ratings assigned to her by her super- 
visors, unrelated to her personality as meas- 
ured by the Bernreuter scales, and independ- 
ent of her attitude toward the teaching pro- 
fession. If the fifth common factor has been 
correctly named, however, there is a tendency 
for teachers possessing liberal social beliefs 
and a sound working knowledge of the symp- 
toms, causes, and remedies of various kinds 
of pupil adjustment to produce greater pupil 
growth than teachers who lack these traits. 





December, 1945] 


SECTION V 
SUMMARY AND CONCLUSIONS 


Restatement of the problem.—The purpose 
of this investigation was to analyze the inter- 
correlations between various instruments fre- 
quently used in studies of teaching ability to 
determine the number and nature of common 
factors that might best be used to describe 
these teacher traits. Having found such basic 
factors, this study is further concerned with 
the manner in which each teacher measure is 
related to the common factors. 


Briej comparison of the two studies —Two 
samples of rural school teachers were selected 
for studying this problem. A battery of meas- 
ures was obtained for each teacher sample, 
measuring the teacher’s mental ability, 
knowledge of subject matter, personality 
traits, social adjustment, supervisory ratings 
and attitudes. Eleven of the measures were 
common to both studies. One of these was the 
pupil gain index of the ability of the teacher 
to teach social studies. 


Comparison of the average scores and 
standard deviations of the two teacher sam- 
ples as reported by Rostker and Rolfe indi- 
cates that although the teachers in the 1937- 


38 Wisconsin Study (Study Number Two) 
averaged somewhat lower in the mental abil- 
ity and informational tests than those in the 
1936-37 Wisconsin Study (Study Number 
One) both groups were about equally variable 
with respect to the several traits measured. 
All levels of emotional adjustment were found 
in both groups. The social adjustment of both 
samples was above average. 


The teachers in Study Number One taught 
only the seventh and eighth grades in a four 
room state graded school, and all but five in 
Study Number Two taught all eight grades 
in one-room schools. The PGTA index was 
based on progress of pupils in only one grade 
of pupils in Study Number One while in 
Study Number Two it was based on the pupil 
progress in both the 7th and 8th grades com- 
bined. 

The teachers in Study One were on the 
average seven years older, had four years 
more teaching experience, and had two years 
of training beyond high school than the 
teachers in Study Two. In Study Number 
One 58% of the teachers were women, while 
in Study Number Two 85% were women. 


FACTOR ANALYSIS OF ABILITIES 


195 


In each study the teacher measures were 
correlated with each other, the resultant cor- 
relational matrix was factored by the centroid 
method, and the factorial] matrix rotated to a 
position that maximized the number of van- 
ishing factor loadings but maintained the 
mutual independence of the common factor 
axes. 


The common factors found in Studies One 
and Two.—Four common factors were found 
to underlie the teacher traits in Study Num- 
ber One and five in Study Number Two. 
Four of the common factors in Study Num- 
ber Two were identified as essentially the 
same as the four common factors identified 
in Study Number One. These four are: 


GKMA, General Knowledge and Mental 
Ability 

TRS, Teacher Rating Scale Factor 

PEA, Personal Emotional Adjustment?’ 


EATP, Eulogizing Attitude Toward the 
Teaching Profession 


These four primary teacher factors are uncor- 
related with each other in both teacher 
samples. 

Table XXII lists all significant common 


factor loadings of measures used in both 
studies. 


Non-significant loadings are also listed for 
those few tests that were used in both studies 
for which the common factor loading was 
significant in one but not in the other study. 
Wherever a test had a significant loading in 
one study but was not used in the other, a 
dash has been substituted for the unknown 
factor loading. 

In columns one and two under GKMA in 
Table XXII it is seen that the GKMA factor 
loadings of five measures common to both 
studies are substantially the same in both 
analyses. The GKMA loadings of PGTA and 
one of the teacher rating scales, TDRS, how- 
ever, were significant in Study Number One 
but zero in Study Number Two. The PGTA 
common factor loadings in the two studies 
differ significantly and shall be discussed in 
the next section. 

The loading of .5o of TDRS on GKMA in 
Study One is not supported by significant 
loadings of the ASRS and MTRS on the 


®" Since factor PESA in Study Number One seems, aside 
from the WSAI factor loading, to be essentially the same as 
PEA in Study Number Two, both are referred to here as PEA. 





196 JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


TABLE XXII 


SIGNIFICANT Factor LOADINGS OF TEACHER TRAITS IN BotH StupiEs (ONE AND Two) CLAssiFiep 
WITH RESPECT TO THE PRIMARY TEACHER ABILITIES 


Teacher Trait 
Two One 
. 86 ee 


GKMaA factor in the same study. All GEMA 
factor loadings of teacher rating scales in 
Study Two are zero. It does not appear there- 
fore that the one loading of .50 for TDRS on 
GKMaA in Study Number One should be in- 
terpreted as indicating a definite dependence 
of supervisory ratings on the teacher’s gen- 
eral knowledge and mental ability. 

The significant loadings of AORM and 
TPMH on GKMA in Study Number One is 
in keeping with the name given to this com- 
mon factor, since AORM measures knowledge 
and skill in social studies research and TPMH 
measures professional knowledge in the field 
of mental hygiene and pupil adjustment prob- 
lems. The significant loading of SASST (Tot.) 
on GKMA reflects the observed tendency of 
informed liberalism to be associated with 
knowledge in many fields and above average 
mental ability. 

Aside from the PGTA loadings, the GEMA 
primary teacher factor appears to enter into 
the teacher measures of both studies in essen- 
tially the same manner. 

The significant factor loadings listed for 
both studies under TRS in the above table 
include all of the supervisory rating scales 
and none of the other measures. The TRS 


GKMA TRS 


Common Factor 
PEA 


EATP PGTA 


Two One Two One Two Two 


. 82 
. 88 
. 64 
—.07 


adh ale: El seed ade 


common factor clearly manifests itself in 
essentially the same manner in both studies. 
As was pointed out in section III the super- 
visory ratings assigned to teachers appear to 
be unrelated to any of the other teacher traits 
measured here. This appears to be true 
whether the rating scale ostensibly measures 
the teaching ability of the teacher or her per- 
sonality traits or effect. 

The PEA common teacher factor in the 
above table shows that all the scores derived 
from The Personality Inventory have their 
only significant common factor loadings on 
this factor. In Study Number One the WSAI 
social adjustment measure is saturated with 
PEA but in Study Number Two no such rela- 
tionship is shown. It appears therefore that 
in Study One social adjustment is associated 
with healthy emotional adjustment, sociability 
and wholesome self confidence, while in Study 
Two social adjustment appears to be unre- 
lated to emotional stability, dominance, or 
self-sufficiency. 

The high saturation of SEAT(T-—A) with 
PEA in Study One suggests that the choices 
made by the teachers on this scale are similar 
to those made by emotionally well adjusted 
persons. In this analysis these responses do 





load 
out 


December, 1945) 


not, however, appear to be related to either 
the supervisory ratings of teachers or to their 
ability to teach as measured by PGTA. 

The EATP column in the above table 
shows that the ATTP measure is independent 
of the first three common factors in both 
studies and has similar loadings on an atti- 
tude factor, one extreme of which has here 
been named Eulogizing Attitude Toward the 
Teaching Profession. The loading of .56 of 
TPPT in Study Number One on this factor 
indicates that a sympathetic and understand- 
ing attitude toward pupils is associated with 
EATP which might perhaps be called a gen- 
eral attitude of good will toward others. The 
loading of .46 of TTPR on EATP pointed 
out in section IV is consistent with this find- 
ing in Study Number One. 

In Study Number Two the four common 
factors just described appear to be unrelated 
to the PGTA index at the 25% level of sig- 
nificance used here. A fifth factor character- 
ized chiefly by the PGTA (37-38) loading of 
.53 was found necessary if this index of 
teaching ability was to be included in the 
common factor solution. This loading is listed 
in the last column of the above table. No 
similar loading appears for Study Number 
One since the first four common factors were 
all that appeared to be operative in that 
study. 

The common factor composition of the 
teacher measures in studies one and two.— 
With only one exception all the tests in both 


GKMA 
42% 
0% 


TRS 
1% 
16% 


Study One 
Study Two 


studies were of unit complexity with respect 
to the common factors, i.e. each test had only 
one significant common factor loading. The 
question of which primary factors each 
teacher test is predominantly a measure of is 
largely answered in the Tables XXII, XI, 
and XXI. 

The measures: ACPE, TCPE, ACCGT, 
OTFAE, MTIL, AORM, TPMH, and SASST 
(Tot.) are saturated significantly with only 
the GKMA primary teacher factor. 

The supervisory ratings: TDRS, ASRS, 
MTRS, PFRS, and PRS are related signifi- 
cantly to only the TRS common factor. 

The BPI scales Fc, Fs, Bn, Bd, and Bs 
and the SEAT(T-A) measure have most of 


FACTOR ANALYSIS OF ABILITIES 


197 


their common factor variance accounted for 
by an emotional adjustment factor PEA. 

The ATTP and TPPT measures are related 
solely to an attitude toward persons which 
has here been named the EATP factor. 

The WSAI measure is related in Study 
Number One to only the PEA factor while 
in Study Number Two its variance is rather 
equally divided between factors GKMA, 
TRS, and EATP, none of the factor loadings 
in Study Number Two being significant how- 
ever. 

The SEAT(A-R) measure in Study One 
appears to be unrelated to the four primary 
teacher factors and measures something for- 
eign to the other measures studied. 

The SOCB measures in Study Two divides 
its variance between GKMA and PGTA, 
neither factor loading being significant how- 
ever. 

The TTPR test in Study Two has mod- 
erate loadings on EATP and PGTA. 

The SASST (Inf.) and SASST (Att.) 
measures in Study Two distribute their com- 
mon factor variance among the common fac- 
tors so evenly that no clear description of 
their common factor composition is evident. 

The common factor composition of the 
ability of the teacher to teach social studies 
as measured here differs significantly in the 
two studies. The manner in which the total 
variance of the PGTA index distributes itself 
in the two studies can be tabulated as fol- 
lows: 


Specificity 

and Error 
41% 
55% 


PEA 


EATP 
9% 1% 


PGTA 


0% 1% 28% 
In Study Number One, 42% of the PGTA 
variance is related to the GKMA teacher 
factor and only minor amounts to the other 
common factors. In Study Number Two, 
28% of the PGTA variance is ascribable to 
PGTA and 16% to TRS. In Study One, 
41% of the variance of (11a) PGTA (36— 
37) is apparently due to specificity and error 
while in Study Two this figure is 55%. If the 
portions attributable to error seem unduly 
large, one should remember that the PGTA 
index as used in each study is the residual 
of the gross pupil gain that remains after 
certain pupil factors have been statistically 
controlled. 





198 JOURNAL OF EXPERIMENTAL EDUCATION 


For Study One it will be recalled that: 
(1) the reliability of the UWH gross pupil 
gain was .76. This means that 58% of the 
observed variance represented real variance 
and the remaining 42% represented error 
variance; (2) the multiple correlation be- 
tween the statistically controlled pupil factors 
and the UWH observed gain was .52. Hence 
the variance attributable to these pupil fac- 
tors represents 27% of the total variance of 
the observed UWH gain. The residuals, 
PGTA (36-37), therefore represent only 
73% of the variance initially observed in 
mean pupil gains; (3) the communality, 
59%, and specificity, 41%, reported in the 
factor analysis refers to the residual 73% 
of the initial variance. 

The same data for Study Two are: (1) the 
reliability of the measured pupil gains was 
.68, corresponding to 46% real and 54% 
error variance; (2) the multiple correlation 
coefficient of pupil control factors against 
observed pupil gain was .69, indicating that 
48% of the observed variance was attrib- 
utable to pupil factors leaving only 52% of 
the variance for the residuals, PGTA (37- 
38); (3) the communality of 45% and spec- 
ificity and error variance of 55% has refer- 
ence to the 52% of the original observed 
pupil gains that PGTA (37-38) represents. 

If one assumes that the reduction of vari- 
ance due to statistical control of pupil factors 
reduces the error variance in the same pro- 
portion as the real variance in the observed 
gains, the total variance of the observed pupil 
gains in the two studies distributes itself in 
Table XXIII. 


TABLE XXIII 


DISTRIBUTION OF VARIANCE OF TOTAL OBSERVED 
Puri, GAIN ON UWH ComMPOosITE IN 
StupIEs ONE AND Two 
Source of Variance Study One Study Two 
GKMA___. f 

TRS. raat 


Controlled Pupil Fac- 
tors and Erro 
BESS. certrigtie cana 


This division of variance approximately rep- 
resents the manner in which the total variance 
in pupil gains distributes itself in the two 
studies. Except for rounding errors both 
columns add up to 100%. 


[Vol. 14, No.2 


The first striking observation that this 
consideration leads to is that in both studies 
practically all of the pupil gain variance is 
attributable to either (1) pupil factors, (2) 
teacher factors, or (3) error variance with 
none left for specificity. This suggests that 
the pupil and teacher factors measured in 
these studies represent all of the potent fac- 
tors that operate to produce the observed 
pupil gains. It must be remembered however 
that the 15% attributed to PGTA in Study 
Number Two is perhaps more of a specific 
than a common factor. 

The manner in which these three kinds of 
factors operate in both studies is: 


£ource of Variance Study One Study Two 
Teacher factors 24% 
Pupil Factors 48°; 
Borer... ... 30% 29% 


While teacher factors appear to be the pre- 
dominant cause of pupil gain in Study One, 
pupil factors appear to assume this role in 
Study Two. It seems therefore that the role 
of the teacher in producing pupil gains in 
social studies in the 7th and 8th grades in a 
one-room rural school is less important than 
in a school where the teacher is in charge of 
only these two classes throughout the entire 
day. The pupil in a one-room rural school “is 
on his own” to a much greater extent and 
hence the amount he learns is dependent upon 
his several abilities to a greater extent than 
upon the abilities of the teacher in charge of 
the room. The writer is inclined to believe 
that much of the difference observed between 
the common factor loadings of the PGTA in- 
dex on the primary teacher abilities in these 
studies is attributable to this basic difference 
in the two kinds of learning situations. 


CONCLUSIONS 
Four independent ptimary teacher abilities, 
the common factors, satisfactorily explain the 
intercorrelations observed between a battery 
of measures commonly used in investigations 
of the nature, measurement, and prediction of 
teaching ability. These are: 


(1) A mental factor, GKMA: General 
Knowledge and Mental Ability Factor, 

(2) A supervisory rating factor, TRS: 
Teacher Rating Scale Factor, 


(3) A personality factor, PEA: Personal 
Emotional] Adjustment Factor, and 


None 
measl 
the s 
four 
lated 

Ea 
deper 
prima 
each 
of is 
XA. 

Te 
used 
are 0! 
growt 
does 
using 
terior 
ment 
ity i 
abilit 

Th 
meas' 
the a 
evalu 
In St 
ing ¢ 
essen! 
meas! 





ee eS es ee de ee 


> 1 = @ 


December, 1945] 


(4) An attitude factor, EATP: Eulogizing 
Attitude Toward the Teaching Pro- 
fession. 


None of these factors is common to all of the 
measures. In the two teacher samples studied, 
the same common factors emerged. These 
four primary teacher abilities are uncorre- 
lated with each other. 

Each of the several teacher measures is 
dependent primarily upon only one of the 
primary abilities. The common factor that 
each instrument is predominantly a measure 
of is summarized in Tables XXII, XI, and 
XXI. 


Teacher rating scales, although frequently 
used to evaluate the effectiveness of a teacher, 
are only slightly related to the observed pupil 
growth in social studies. The relationship 
does not appear large enough to warrant 
using supervisory ratings as a secondary cri- 
terion in studies dealing with the measure- 
ment of teaching ability, where teaching abil- 
ity is properly conceived in terms of the 
ability to promote pupil growth. 

This study reveals no single teacher 
measure that can validly be substituted for 
the actual measurement of pupil growth in 
evaluating the ability of the teacher to teach. 
In Study Number Two, the pupil gain teach- 
ing ability index, PGTA, was found to be 
essentially a specific factor in the battery of 
measures analyzed, with a small factor load- 


FACTOR ANALYSIS OF ABILITIES 199 


ing of .394 on primary factor TRS. In Study 
Number One, PGTA was principally depend- 
ent upon primary factor GKMA for an 
account of its variance, with negligible load- 
ings on the other three common factors. In 
four-room rural state graded schools (Study 
One), the amounts pupils learn in social 
studies is dependent upon teacher factors to 
a larger extent than upon pupil factors, while 
in one-room rural schools (Study Two) the 
reverse is true. In the two-grade classroom, 
the largest pupil gains are observed for teach- 
ers who rank high on primary factor GEMA 
while in the one-room rural schools the pupil 
gains are dependent principally upon the 
pupils’ mentality, reading ability, and initial 
status. 

Although no single teacher measure can 
validly be substituted for PGTA in the meas- 
urement of teaching ability, the PGTA com- 
munalities of .64 and .46 in Studies One and 
Two indicate that PGTA can be predicted 
from all or a subset of the 18 or 19 teacher 
traits with maximum multiple correlations of 
80 and .68 respectively (the maximum mul- 
tiple correlation being equal to the square 
root of the communality). 

All findings in this investigation apply, of 
course, only to the samples of teachers 
studied and to teaching ability as measured 
in social studies at the 7th- and 8th-grade 
levels in the kinds of rural schools that were 
involved here. 





IMPRESSIONS, TRENDS, AND FURTHER RESEARCH 


A. S. Barr 
University of Wisconsin 


The authors have presented in the several 
studies here reported a multiple approach to 
the measurement of teaching ability. All to- 
gether seven different studies have been pre- 
sented, each with a separate approach and 
summary of findings. It will be the purpose 
of this paper not to further summarize these 
several individual studies, but, rather, to pre- 
sent an overview of the impressions, trends, 
and observed needs growing out of them, for 
further research. Sometimes the by-products 
are more important than their primary out- 
comes. 

In the first place, it seemed apparent in 
planning this study that the criterion of 
teaching efficiency was tremendously ‘impor- 
tant; as the study is completed it seems even 
more important. Further effort in this area 
should lead to further refinement of the cri- 
terion against which other measures of teach- 
ing ability may be validated. The major cri- 
terion used in this investigation has been that 
of pupil growth and achievement expressed as 
a composite of change scores. These change 
scores were found by subtracting the initial 
test scores from the final test scores on the 
several measures applied to the pupils, and 
further treated to secure statistical compar- 
ability. An attempt was made in the develop- 
ment of the criterion to measure not merely 
information outcomes, but also changes in 
attitudes, skills, and behaviors. To get infor- 
mation about the character of the measures 
applied to the pupils, the reader should exam- 
ine these measures individually. The reader, 
will not, we are certain, be misled by the 
label employed in the naming of the instru- 
ments; in some instances the labels are obvi- 
ously quite misleading. 


In choosing the criterion of pupil change 
as the primary criterion of teaching efficiency 
for this investigation it was perfectly clear to 
the investigators that teaching in the modern 
vschool involves much more than the guidance 
of learning activities. It involves many im- 
portant teacher-pupil relations; teacher- 
teacher relationships; teacher-administrator 
relationships; and teacher-community rela- 


tionships and the many important responsi- 
bilities growing out of these. These relation- 
ships will limit in a significant respect the 
teacher’s success in a given situation and ulti- 
mately affect pupil growth and achievement. 
So important are these relationships that it 
would seem desirable to subject them to spe- 
cial investigation. This will have to be done 
at a later date. 

v One of the interesting problems that arose 
early in the attempt to develop a criterion of 
teaching efficiency was that of handling gain 
scores in some meaningful manner. In har- 
mony with the findings of other investigators' 
“he gain scores were found to be negatively 
correlated with such factors as intelligence, 
socio-economic status, and reading ability, 
ordinarily thought to limit pupil achievement. 
To give proper weights to these factors as it 

“would seem reasonable to do, a multiple re- 
gression equation was set up in which each 
individual pupil’s performance was predicted 
and the predicted scores subtracted from the 
actual gains to secure a criterion. In using 
these residuals as the criterion it has been 
assumed that certain other factors in pupil 
performance could be equalized. Not all of 
these factors in pupil performance could, how- 
ever, be equalized. For those that were meas- 
ured it was assumed that they were randomly 
distributed. This is an important assumption 
that will need careful consideration in future 
research. This is merely one of the many 
assumptions made in this investigation. 

It has already been said that pupil change 
was employed as the major criterion of teach- 
ing efficiency in this series of investigations. 
Theoretically the criterion of pupil change 
seems sound. Actually, its use presents many 
very real difficulties, such as: How is one to 
know what the goals of teaching and learn- 
ing should be? How may one measure the 
outcomes of learning and teaching ade- 
quately? And, how may one treat the data to 
secure reliable results? To offset these diffi- 
culties the criterion ‘of pupil change was sup- 
plemented in certain of the studies by super- 


1 Charles A. Drake, ‘The Iota Function,” Journal of Edu- 
cational Research, XXXIV (November, 1940), pp. 190-198. 


200 





December, 1945 | IMPRESSIONS AND FURTHER RESEARCH 


TABLE I 
SUMMARY OF VALIDITY COEFFICIENTS 


Scales 


Pupil Change 
Pupil Change 
Supt. Rating 
Super. Rating 
Teacher Tests 
Teacher Ratings 
Three Personality 


La Duke (Composite) 
Pupil Change 


Rostker (Composite) 
Pupil Change 

Rolfe (Composite) 
La Duke (Composite) 
La Duke (Compcsite) 
Gotham (Composite) 
Gotham (Composite) 
Gotham (Composite) 
Gotham (Composite) 


Wrightstone 
Abilities a 
American Council 
Psychological 
Hartman Social _- 
Yeager Attitudes. 
To 
ental-Hygiene 
Teachers Co lege 
Psychologica 
Community 
Planning- 
Health Test _ 
American Council | 
Gov't. Civics 
Bernreuter 
(B-n)_- -- - 
Bernreuter (F-c)____ 
Bernreuter (B-d)___- 
Bernreuter (F-s)_ -_- 
Bernreuter (B-s) _ _ __ 
Orientation._-- -- -- 
Alm -Sorenson 
(Composite) 
Michigan RatingY 
(Composite) 
Morris Trat 


Washburne Social 

Adjust. Invent. - _ - 
Teacher Problems-_ 
Stanford (T-A) _ - 
Stanford (A-R) 
Stanford (T-R) 
Harnly Purposes-__- 
Harnly Policies 
Harnly Objectives___ 
Harnly Methods. _ - 
Harnly Total 


Jackson Socal 
ow 
To 


oacompos 
Wrightstone Civic 


Teacher Pu il 
Relationship __ 
Sims Socio- Econ- 
omic Status 
Rudisill Scale 
Personal Fitness cs . 


Salary 
Experience 
Pupil Change _ 





202 


visory ratings and tests of qualities commonly 
associated with teaching success. There are 
considerable data in the studies reported to 
indicate that different things are measured by 
these different criteria. Gotham, for example, 
(Table I) found that a composite of tests 
of qualities commonly associated with teach- 
ing success correlated with pupil change with 
an r of only .13; a composite of personality 
ratings correlated with pupil change with an 
r of only .37; and a composite of teacher 
rating scales correlated with pupil change 
with an r of .40. Whatever may be said about 
the relative merit of these criteria, they do 
not measure, except in some very small 
amount, the same things. Many of the incon- 
sistencies in the results of’ research in this 
area can be traced in part to differences in 
the criteria employed. 

A study of the many correlations reported 
herein for each of the various tests and cri- 
teria brings out, too, the well known fact that 

coefficients of correlation cannot be taken at 
face value.* Each correlation is in part the 
product of extraneous features of the situa- 
tion and in part the product of the criterion. 
The correlations for the Washburne Social 
Adjustment Inventory, for example, vary 
from .06 to .47 depending upon the criterion 
and the setting from which the data were de- 
rived. The correlations for the American 
Council Psychological Examination ran from 
—.12 to +.57. Rolfe, using the Gotham- 
Carlson data, reported, for example, a corre- 
lation of —.10 for the American Council Psy- 
chological Examination with pupil change as 
the criterion, which is considerably lower than 
that reported by Rostker (.57) and LaDuke 
(.54). The low correlation reported by Rolfe 
may be traceable to the manner in which this 
particular test was administered or to some 
other extraneous factor. A variation of a 
different sort is that found in Gotham’s data 
for the Washburne Test of Social Adjustment 
where three different sorts of criteria were 
employed. The correlation of the Washburne 
Scale with pupil change is .06; with a com- 
posite of three rating scales thrice applied, 
.40; with a composite of tests of qualities 
commonly associated with teaching success, 
.47. The latter correlation can probably be 
explained by the fact that it is common prac- 
tise to standardize teacher tests against super- 
visory rating as a criterion, and these tests 


2A. S. Barr, “The Coefficient of Correlation,” Journal of 
Educational Research, XXIII (January, 1931), pp. 55-60. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No.2 


were probably so standa’ . ‘4d. The incon- 
sistencies for the Harnly wiatements about 
Education are particularbysinteresting. When 
a criterion of rating scales was used the cor- 
relation was .39; but when the criterion of 
pupil change weighted heavily by the thing 
measured by the Wrightstone Tests was used 
as the criterion, the correlation was —.o2: 
the correlation of part IV (method) with 
pupil change was —.31. Apparently, while 
supervisors preferred the more progressively 
inclined teachers, the more they were so in- 
clined the poorer the results, judged by the 
criterion of pupil change as here measured. 
in general, an understanding of the essential 
relationships in teacher prediction would 
seem to arise not from the observation of 
single coefficients of correlation, but from the 
study of agreements, disagreements, and 
trends as well. 

Some persons may be disturbed by the low 
coefficients of correlation found in this and 
other investigations. The results of the re- 
searches in this area even with low coeffi- 
cients of correlation are not, however, without 
significance. In the first place, low coefficients 
may arise from selective factors operating in 
the samples studied. The results obtained 
from applications of intelligence tests seems 
to illustrate this point; here low coefficients 
of correlation may merely indicate that most 
of the persons graduated from our teacher 
training institutions have enough intelligence 
to teach and that the range of talent has been 
definitely narrowed through selective factors 
operating throughout the school program in 
elementary, secondary, and higher education 
alike. To conclude here that low coefficients 
of correlation, as some have done, indicate an 
absence of relationship between intelligence ’ 
and teaching efficiency would seem to be erro- 
neous. A similar line of reasoning might be 
applied to other measures employed in this 
and other investigations. 

One of the most potent causes of low cor-* 
relations will be found, in the judgment of 
the writer, not in the above named cause, but 
in the nature of teaching ability itself. Teach- 
ing is a very complex activity, composed of 
many parts. Within the unity of the function- 
ing whole there are numerous components no 
one of which, except within very broad limits, 
contributes greatly to teaching efficiency. 
Taken together however, they produce the 
effect observed. This observation seems to be 





December, 194 


supported by ] ews,*® when he reports that 
the individual is of valid tests may not 
discriminate betw a good and poor teachers, 
The conventional item analysis technique 
thus seems not to work here; at least the dis- 
criminating power of individual test items 
seem to be lost in errors of measurement or 
something. A very careful study needs to be 
made of this problem in the measurement of 
teaching ability. 


By and large, the overall picture and future 
for the measurement of teaching ability seems 
promising’ Studies by Rostker, Rolfe, and 
LaDuke seem to show that when enough of 
these tests are put together a very satisfac- 
tory multiple correlation may be secured. 
Rostker found, for example, from a regression 
equation involving fourteen measures, a mul- 
tiple R of .84; Rolfe, using eleven measures 
secured a multiple R of .65; LaDuke using 
four measures, a multiple R of .80. With fur- 
ther refinement the aggregate result should be 
quite satisfactory and of great practical im- 
port. When one considers the amount of in- 
formation that one might collect in a rela- 
tively short testing period from the use of a 
battery of carefully chosen measures, the 
future use of such measures would seem 
promising. 

While the statistical-test approach here de- 
scribed seems to promise worthwhile results, 
the logic of the situation also seems to sug- 
gest certain very definite upper limits to the 
correlations that may be secured from appli- 
cations of this technique, even with greatly 
refined instruments of measurement. Our” 
ability to apply the statistical test approach 
to the measurement of teaching ability rests 
upon the fact that there are common factors 
in persons and situations which, on the sur- 
face, may appear quite diverse. The extent to 
which these common factors may be discov- 
ered is one of the problems of this area. In 
addition to these common elements, each per- 
son or situation may present or require some 
unusual attitude, knowledge, special skill, or 
particular personal quality. These special as- 
pects of particular learning-teaching situa- 
tions ‘impose definite limitations upon the 
statistical-test approach. These limits seem ‘ 
not, however, to have been as yet reached. 


This point can be further illustrated from 
the field of the techniques of learning. The 


IMPRESSIONS AND FURTHER RESEARCH 


203 


choice of learning and teaching procedures 
should be made not merely upon the basis of 
certain general considerations but also because 
of many particular or special considerations. 
First of all, the choice will depend upon one’s 
purpose. From this point of view, that course 
of action is good, broadly conceived, that 
serves one’s purpose. Some means are used 
more frequently than others, but no means 
can be said to be generally better than an- 
other divorced from purpose. A particular 
means may never be used again, but for the 
particular end sought it may be superb. From 
this point of view much of the experimental 
study of method seems poorly conceived. To 
illustrate and explain this point further we 
may put the question: Which is better, a 
handsaw or a screw driver? The answer is: It 
depends upon what one wants to do. Second, 
leaving this point, what one does will depend 
upon the persons involved. Teachers, pupils, 
and parents are not all alike, and they will not 
all respond equally well to the same approach.” 
This in no way denies the fact that there are 
factors common to all individuals, but rather 
that there are likenesses and differences and 
both need to be considered in choosing what 
to do. Third, one’s choice of what to do de-v 
pends upon the principles that one holds to 
be true. Principles of fair play, functionality, 
and interest are merely verbalized statements 
of observed uniformities in human behavior. 
These are the common factors or elements 
referred to above. Educational psychology, 
philosophy, and mathematics abound in gen- 
eralizations of this sort. While these general- 
izations present merely one set of factors to 
be considered in the choice of means, courses 
of action, or techniques, they are important. 
They constitute the bulk of the content of 
professional education. Finally, what one does 
will depend upon other special and limiting 
aspects of the particular situation at hand. 
The pu , persons, and principles may 
suggest that one use some particular sort of 
equipment such as, for example, a globe, but 
if there is no globe and one cannot be secured, 
then one must alter the course of action and 
use an apple or something else. There are 
many such special aspects of the immediate 
learning-teaching situations to be found in 
community resources, school buildings, sup- 
plies, and equipment that will need to be con- 
sidered in the choice of courses of action. 


Wisconsin, 1953, >” “eesis om Sle, Library, University of ~The point of this discussion is that teaching 





204 JOURNAL OF_ EXPERIMENTAL EDUCATION 


efficiency is determined in a very real sense 
by the special aspects of the situation at hand, 
as well as the elements common to a number 
of situations. ~ 


Jayne’s study seems to throw light upon 
this important aspect of method and the 
measurement of teaching ability. When pro- 
cedures were divorced from their settings and 
correlated with the criterion of pupil change 
only 20, or 8%, of the 504 relationships were 
found to be statistically significant. This re- 
sult may have arisen in part because of the 
small part played by any one of the tech- 
niques studied as already pointed out, but it 
may have arisen too in part from the neglect 
of the appropriateness factor in the choice of 
techniques. The 8% for which statistically 
significant correlations were found were for 
the most part of a more general sort, such 
as, for example, the percent of teacher and 
pupil talk; the sorts of questions asked by 
teachers and pupils; comments upon the im- 
portance of contributions; and correctness of 
responses rather than particular acts. When 
seven of these items were combined into an 
index of meaningfulness a correlation of .63 
was secured with pupil change; and .81 for 
a corrected score and the criterion. One of the 
most important contributions to the nature 
of method in Jayne’s study is seen when the 
results from his two investigations are com- 
pared. Under the conditions of the first study, 
with its broad purposes, a high positive cor- 
relation was found, as reported above, be- 
tween the index of meaningfulness and pupil 
gain. In the second investigation, with its 
emphasis upon factual learning, the signs of 
the coefficients shifted from positive to nega- 
tive, with high negative coefficients of cor- 
relation of —.66 and —.55. Jayne attributes 
this change to a shift in purpose. If correct, 
these facts would seem to underline the fact 
that means must be adapted to purposes and 
other limiting aspects of the situation if best 
results are to be secured. 


” Gotham made a special study of the teach- 
er’s personality as a factor in teaching suc- 
cess. No significant relationship was found 
between the criterion of pupil change and 
such personality inventories as the Bernreuter 
Personality Inventory, Bn, (r = —.14), the 
Washburne Social Adjustment Inventory (r 
== .06), and the Rudisill Scale for the Meas- 
urement of the Personality of Elementary 


[Vol. 14, No. 2 


School Teachers (r = .03). The correlations 
were higher when a composite of supervisory 
ratings was used as the criterion. A multiple 
correlation of .40 was found when seven 
composites of eighteen teacher personality 
measures were employed in a regression equa- 
tion for predicting pupil gain. Most all of the 
increase in the correlation is traceable to the 
supervisory ratings, since they alone corre- 
lated with pupil gains with an r of .40. 

These results seem important in more than 
one respect. In the first place it seems appar- 
ent that much more work needs to be done 
in this area. Personality is poorly defined. If 
there are componefits these need to be de- 
fined. Whether there are or are not, the tests 
seem to be based on this assumption. As far 
as the ratings are concerned it would be diffi- 
cult to know what qualities ths supervisors 
who rated these teachers had in mind. Some 
persons have thought that the score was pri- 
marily a personal charm score or one of 
mutual agreement; whatever it is, it does not 
correlate to any considerable extent, as 
already pointed out, with pupil growth and 
achievement. It is interesting to note, too, 
that whatever they may have had in mind 
the intercorrelations between their ratings, 
regardless of the measures employed, are 
quite high (.72 to .93). Possibly this would 
seem to indicate some central thought or idea 
running through these ratings. If personality 
is defined by its effects it might seem best to 
study how it functions in maintaining better 
working relations and in getting better results. 
Subjectively, it would seem that there may be 
two sets of qualities involved in these person. 
ality effects. In the first place there are the 
fixed physical qualities such as height, weight, 
regularity of features, color of hair and eyes, 
and the like; then there are the variable pat- 
terns of social behavior that lead people to 
label individuals as honest, reliable, consid- 
erate, and so on. The effects of both of these 
upon evaluators depends in part upon the 
cultural pattern in which we move. Both sets 
of factors need to be considered, in future 
research, (1) in relation to the cultural pat- 
tern of the community in which teachers 
work, and (2) in relation to the personal 
idiosyncrasies of raters. The point here is that 
personality would seem to need a frame of 
reference. 


Another interesting fact brought out by 
Gotham’s data is that nothing was gained 





Dece' 


from 
sion 

have 
terio! 
work 
instr 
meas 
tion 

relat 
and 

fact 
than 


December, 1945 | 


from the application of the multiple regres- 
sion technique. The result may, of course, 
have been colored by the character of the cri- 
terion, but it may indicate too that more 
work needs to be done in refining measuring . 
instruments in this area. With more adequate 
measures of personality and better orienta- 
tion the results might be quite different. The 
relationships between personality measures 
and pupil growth might be much more satis- 
factory, too, if the criteria emphasize more 
than they now do the social and personal 
growth of pupils. “ 

When one thinks about these very human 
aspects of teaching and learning it is difficult 
not to speculate upon where method leaves 
off and personality begins or vice versa. The 
line seems imperfectly drawn; possibly our 
conception of method is a very limited one, 
and with more attention to personal and 
social growth the line of demarcation might 
be removed altogether. This needs more 
attention. 


/ Von Eschen tried to change teachers and 
pupils in certain important respects thought 
to be associated with teaching efficiency. His 


efforts to improve teachers were centered 
upon: knowledge of the subject; social in- 
formation and attitudes; attitude toward 
teachers and the teaching profession; and 
teacher-pupil relations. Tests were given in 
these areas at the beginning and end of the 
school year. Significant changes were obtained 
in information about social issues and teacher- 
pupil relations. The changes in knowledge of 
subject, social attitudes, and attitudes toward 
teachers and the teaching profession were not 
statistically significant. The improvement 
program as it related to the pupils pertained 
to reading and basic study skills. Statistically 
significant changes were made in these areas, 
but in no instances were these changes sig- 
nificantly correlated with pupil change as 
measured by the tests employed in this inves- 
tigation. The supervised groups did exceed 
the unsupervised group, however, with sta- 
tistically significant differences, for the unit 
tests on health; the Wrightstone Research; 
the Wrightstone Ability to Apply Generaliza- 
tions to Social Studies Events; and the Hill 
Test of Civic Information. These statistically 
significant changes may have arisen, however, 
from unmeasured teacher and pupil change. 


IMPRESSIONS AND FURTHER RESEARCH 


205 


One of the purposes of this investigation 
was to discover measures that correlated 
significantly with teaching efficiency.“In look- 
ing over the data (Table I) the following 
measures seem to give promise of satisfactory 
results: for intelligence, the American Council 
Psychological Examination with r’s of .57, 
-53, and —.10; for interest and motivation, the 
Yeager Scale for Measuring Attitude Toward 
Teachers and the Teaching Profession with 
r’s of +.45, .16, and .22; for a knowledge of 
subject matter (in this case the social studies) 
the American Council Test of Government 
and Civics (with r’s of .36 and —.or) and the 
Wrightstone Abilities Test with an r of .58; 
for social attitudes and information, the Hart- 
man Test of Social Attitudes with r’s of .52 
and .38; for professional knowledge and judg- 
ment, the Harnly Statements about Educa- 
tion (Educational Methods) with an r of 
—.32; for emotional stability, the Washburne 
Social Adjustment Inventory with r’s of .40 
and .47 with supervisory ratings and .o6 and 
.13 with pupil change and the Bernreuter Per- 
sonality Inventory (Bn) with r’s of —.14 and 
—.31; for skill in teacher-pupil relations, the 
Torgerson Teacher-Pupil Relations Test with 
r’s of .22 and .35 and .45; and for personality 
any one of the several teacher rating scales 
employed in this investigation with r’s of .23 
to .43. 


One of the problems of research in this 
area is to reduce the number of measures to 
a more manageable list. Hellfritzsch took this 
for the purpose of his study. Four, possibly 
five, factors seemed to suggest themselves 
from his analysis and the tests employed in his 
study, namely, (1) information; (2) the qual- 
ities considered important by raters in evalu- 
ating teaching efficiency; (3) emotional bal- 
ance and adjustment; (4) sympathetic attitude 
toward teaching and possibly toward people 
in general; and (5) possibly some specific 
teaching ability factor. His analysis and that 
of LaDuke suggest the possibility of reducing 
the number of different measures greatly and 
the area in which more work might be done 
with profit. There are probably other areas 
such as those of drive and skill in speech not 
covered by the tests in his study. 


In looking forward to further research cer- 
tain suggestions may not be out of place: 


(1) One of the most important aspects of 
the work in this area is the criterion of teach- 





206 


ing efficiency. Obviously if the criterion is 
faulty, that which follows is inconsequential. 
Whether supervisory ratings can be justified 
as a criterion remains yet to be determined. 


Whatever their value it is quite evident from | 


the data here provided that they measure 
something quite different from pupil change. 
Whether this will continue to be so with the 
inclusion of more measures of social and per- 
sonal growth in the criterion of pupil change 
and with the further refinement of rating 
techniques remains to be determined by fur- 
ther study. 

(2) For the most part the criteria em- 
ployed in this study are of the composite sort; 
there are numerous exceptions as in the 
studies by Rostker, LaDuke, Gotham, and 
von Eschen. It would seem important, how- 
ever, to study in further research the rela- 
tionship between various teacher qualities 
and various sorts of educational outcomes. 

“The data here presented would seem to sug- 
gest that certain teacher qualities and actions 
are prerequisite to certain outcomes and other 
qualities to yet other outcomes. Possibly we 
need differential prediction in teacher per- 
sonnel research as in pupil personnel research. 
(3) To be practicable the measures in this 
area must be reduced in number, if possible, 
to a few more inclusive ones. 
~ (4) Teaching is a very human activity. 
Possibly the teacher should set a better ex- 
ample of good human relationships, of social 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No. 2 


values, and good citizenship. To explore this | 
realm as it should be, better measures of these ” 
human qualities need yet to be developed. © 


(5) The success of the teacher depends in © 
no small part upon the extent that what she | 
has to offer fits into the expectancy of the | 
pupils, parents, school officials, in the com- 
munity in which she works. These individual 7 
determiners of teaching efficiency need fur- 
ther study. 


(6) There are numerous approaches to the 7 
measurement of teaching efficiency such as, 
measures of qualities commonly associated 
with teaching efficiency; measures of the © 
knowledges, skills, and attitudes thought © 
essential to teaching efficiency; measures of © 
teacher behavior and performance; and pupil ~ 
growth. For each of these approaches a ~ 
variety of measuring devices have been em- 
ployed: tests, rating scales, behavior records, 7 
questionnaires, interviews, inventories, auto- — 
biographies, etc. The interrelationships among 
these measures and approaches to the evalua- © 
tion of teaching efficiency need further study, 7 


(7) Some factors in teaching efficiency, 
needing further exploration and study are: 
drive, expressiveness, academic aptitude, cul- 
tural background, knowledge of subject, atti- 
tude toward teachers and teaching, social 
sensitivity and proficiency in human rela- 
tions; teacher-pupil relations, professional 
judgment; and emotional adjustment. 








